SPARC Logical Domains: Alternate Service Domains Part 1

SPARC Logical Domains: Alternate Service Domains Part 1

In this series we will be going over configuring alternate I/O and Service domains, with the goal of increasing the serviceability the SPARC T-Series servers without impacting other domains on the hypervisor.  Essentially enabling rolling maintenance without having to rely on live migration or downtime.  It is important to note, that this is not a cure-all, for example base firmware updates would still be interruptive, however minor firmware such as disk and I/O cards only should be able to be rolled.

In Part One we will go through the initial Logical Domain configuration, as well as mapping out the devices we have and if they will belong in the primary or the alternate domain.

In Part Two we will go through the process of creating the alternate domain and assigning the devices to it, thus making it independent of the primary domain.

In Part Three we will create redundant services to support our Logical Domains as well as create a test Logical Domain to utilize these services.

Initial Logical Domain Configuration

I am going to assume that your configuration is currently at the factory default, and that you like me are using Solaris 11.2 on the hypervisor.

# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 256 511G 0.4% 0.3% 6h 24m

The first thing we need to do is remove some of the resources from the primary domain, so that we are able to assign them to other domains.  Since the primary domain is currently active and using these resources we will enable delayed reconfiguration mode, this will accept all changes, and then on a reboot of that domain (in this case primary which is the control domain – or the physical machine) it will enable the configuration.

# ldm start-reconf primary
Initiating a delayed reconfiguration operation on the primary domain.
All configuration changes for other domains are disabled until the primary
domain reboots, at which time the new configuration for the primary domain
will also take effect.

Now we can start reclaiming some of those resources, I will assign 2 cores to the primary domain and 16GB of RAM.

# ldm set-vcpu 16 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
ldm set-memory 16G primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

I like to add configurations often when we are doing a lot of changes.

# ldm add-config reduced-resources

Next we will need some services to allow us to provision disks to domains and to connect to the console of domains for the purposes of installation or administration.

# ldm add-vdiskserver primary-vds0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
# ldm add-vconscon port-range=5000-5100 primary-vcc0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------

Lets add another configuration to bookmark our progress.

# ldm add-config initial-services

We need to enable the Virtual Network Terminal Server service, this allows us to telnet from the control domain into the other domains.

# svcadm enable vntsd

Finally a reboot will put everything into action.

# reboot

When the system comes back up we should see a drastically different LDM configuration.

Identify PCI Root Complexes

All the T5-2’s that I have looked at have been laid out the same, with the SAS HBA and onboard NIC on pci_0 and pci_2, and the PCI Slots on pci_1 and pci_3.  So to split everything evenly pci_0 and pci_1 stay with the primary, while pci_2 and pci_3 go to the alternate.  However so that you understand how we know this I will walk you through identifying the complex as well as the discreet types of devices.

# ldm ls -l -o physio primary

NAME
primary

IO
DEVICE PSEUDONYM OPTIONS
pci@340 pci_1
pci@300 pci_0
pci@3c0 pci_3
pci@380 pci_2
pci@340/pci@1/pci@0/pci@4 /SYS/MB/PCIE5
pci@340/pci@1/pci@0/pci@5 /SYS/MB/PCIE6
pci@340/pci@1/pci@0/pci@6 /SYS/MB/PCIE7
pci@300/pci@1/pci@0/pci@4 /SYS/MB/PCIE1
pci@300/pci@1/pci@0/pci@2 /SYS/MB/SASHBA0
pci@300/pci@1/pci@0/pci@1 /SYS/MB/NET0
pci@3c0/pci@1/pci@0/pci@7 /SYS/MB/PCIE8
pci@3c0/pci@1/pci@0/pci@2 /SYS/MB/SASHBA1
pci@3c0/pci@1/pci@0/pci@1 /SYS/MB/NET2
pci@380/pci@1/pci@0/pci@5 /SYS/MB/PCIE2
pci@380/pci@1/pci@0/pci@6 /SYS/MB/PCIE3
pci@380/pci@1/pci@0/pci@7 /SYS/MB/PCIE4

This shows us that pci@300 = pci_0, pci@340 = pci_1, pci@380 = pci_2, and pci@3c0 = pci_3.

Map Local Disk Devices To PCI Root

First we need to determine which disk devices are in the zpool, so that we know which ones that cannot be removed.

# zpool status rpool
pool: rpool
state: ONLINE
scan: resilvered 70.3G in 0h8m with 0 errors on Fri Feb 21 05:56:34 2014
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c0t5000CCA04385ED60d0 ONLINE 0 0 0
c0t5000CCA0438568F0d0 ONLINE 0 0 0

errors: No known data errors

Next we must use mpathadm to find the Initiator Port Name.  To do that we must look at slice 0 of c0t5000CCA04385ED60d0.

# mpathadm show lu /dev/rdsk/c0t5000CCA04385ED60d0s0
Logical Unit: /dev/rdsk/c0t5000CCA04385ED60d0s2
mpath-support: libmpscsi_vhci.so
Vendor: HITACHI
Product: H109060SESUN600G
Revision: A606
Name Type: unknown type
Name: 5000cca04385ed60
Asymmetric: no
Current Load Balance: round-robin
Logical Unit Group ID: NA
Auto Failback: on
Auto Probing: NA

Paths:
Initiator Port Name: w5080020001940698
Target Port Name: w5000cca04385ed61
Override Path: NA
Path State: OK
Disabled: no

Target Ports:
Name: w5000cca04385ed61
Relative ID: 0

Our output shows us that the initiator port is w5080020001940698.

# mpathadm show initiator-port w5080020001940698
Initiator Port: w5080020001940698
Transport Type: unknown
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@1
Initiator Port: w5080020001940698
Transport Type: unknown
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@2
Initiator Port: w5080020001940698
Transport Type: unknown
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@8
Initiator Port: w5080020001940698
Transport Type: unknown
OS Device File: /devices/pci@300/pci@1/pci@0/pci@2/scsi@0/iport@4

So we can see that this particular disk is on pci@300, which is pci_0.

Map Ethernet Cards To PCI Root

First we must determine the underlying device for each of our network interfaces.

# dladm show-phys net0
LINK MEDIA STATE SPEED DUPLEX DEVICE
net0 Ethernet up 10000 full ixgbe0

In this case ixgbe0, we can then look at the device tree to see where it is pointing to to find which PCI Root this device is connected to.

# ls -l /dev/ixgbe0
lrwxrwxrwx 1 root root 53 Feb 12 2014 /dev/ixgbe0 -> ../devices/pci@300/pci@1/pci@0/pci@1/network@0:ixgbe0

Now we can see that it is using pci@300, which translates into pci_0.

Map Infiniband Cards to PCI Root

Again lets determine the underlying device name of our infiniband interfaces, on my machine they were defaulted at net2 and net3, however I had previously renamed the link to ib0 and ib1 for simplicity.  This procedure is very similar to Ethernet cards.

# dladm show-phys ib0
LINK MEDIA STATE SPEED DUPLEX DEVICE
ib0 Infiniband up 32000 unknown ibp0

In this case our device is ibp0.  So now we just check the device tree.

# ls -l /dev/ibp0
lrwxrwxrwx 1 root root 83 Nov 26 07:17 /dev/ibp0 -> ../devices/pci@380/pci@1/pci@0/pci@5/pciex15b3,673c@0/hermon@0/ibport@1,0,ipib:ibp0

We can see by the path, that this is using pci@380 which is pci_2.

Map Fibre Channel Cards to PCI Root

Now perhaps we need to have some Fibre Channel HBA’s split up as well, first thing we must do is look at the cards themselves.

# luxadm -e port
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,qlc@0/fp@0,0:devctl NOT CONNECTED
/devices/pci@300/pci@1/pci@0/pci@4/SUNW,qlc@0,1/fp@0,0:devctl NOT CONNECTED

We can see here that these use pci@300 which is pci_0.

The Plan

Basically we are going to split our PCI devices by even and odd, with even staying with Primary and odd going with Alternate.  On the T5-2, this will result on the PCI-E cards on the left side being for the primary, and the cards on the right for the alternate.

Here is a diagram of how the physical devices are mapped to PCI Root Complexes.

T5-2-front-annotated

FIGURE 1.1 – Oracle SPARC T5-2 Front View

T5-2-rear-annotatedFIGURE 1.2 – Oracle SPARC T5-2 Rear View

References

SPARC T5-2 I/O Root Complex Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.z40005601508415.html

SPARC T5-2 Front Panel Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.bbgcddce.html#scrolltoc

SPARC T5-2 Rear Panel Connections – https://docs.oracle.com/cd/E28853_01/html/E28854/pftsm.bbgdeaei.html#scrolltoc