SPARC Logical Domains: Live Migration

SPARC Logical Domains: Live Migration

One of the ways that we are able to accomplish regularly scheduled maintenance is by utilizing Live Migration, with this we can migrate workloads from one physical machine to another without having service interruption.  The way that it is done with Logical Domains is much more flexible than with most other hypervisor solutions, it doesn’t require any complicated cluster setup, no management layer, so you could literally utilize any compatible hardware at the drop of the hat.

This live migration article also focuses on some technology that I have written on, but not yet published (should be published within the next week), this technology is Alternate Service Domains, if you are using this then Live Migration is still possible, and if you are not using it, then Live Migration is actually easier (as the underlying devices are simpler, so it is simpler to match them).

Caveats to Migration

  • Virtual Devices must be accessible on both servers, via the same service name (though the underlying paths may be different).
  • IO Domains cannot be live migrated.
  • Migrations can be either online “live” or offline “cold” the state of the domain determines if it is live or cold.
  • When doing a cold migration virtual devices are not checked to ensure they exist on the receiving end, you will need to check this manually.

Live Migration Dry Run

I recommend performing a dry run of any migration prior to performing the actual migration.  This will highlight any configuration problems prior to the migration happening.

# ldm migrate-domain -n ldom1 root@server
Target Password: 

This will generate any errors that would generate in an actual migration, however it will do so without actually causing you problems.

Live Migration

When you are ready to perform the migration then remove the dry run flag.  This process will also do the appropriate safety checks to ensure that everything is good on the receiving end.

# ldm migrate-domain ldom1 root@server
Target Password: 

Now the migration will proceed and unless something happens it will come up on the other system.

Live Migration With Rename

We can also rename the logical domain as part of the migration, we simply specify the new name.

# ldm migrate-domain ldom1 root@server:ldom2
Target Password: 

In this case the original name was ldom1 and the new name is ldom2.

Common Errors

Here are some common errors.

Bad Password or No LDM on Target

# ldm migrate-domain ldom1 root@server
Target Password:
Failed to establish connection with ldmd(1m) on target: server
Check that the 'ldmd' service is enabled on the target machine and
that the version supports Domain Migration. Check that the 'xmpp_enabled'
and 'incoming_migration_enabled' properties of the 'ldmd' service on
the target machine are set to 'true' using svccfg(1M).

Probable Fixes – Ensure you are attempting to migrate to the correct hypervisor, you have the username/password combination correct, and that the user has the appropriate level of access to ldmd and that ldmd is running.

Missing Virtual Disk Server Devices

# ldm migrate-domain ldom1 root@server
Target Password:
The number of volumes in mpgroup 'zfs-ib-nfs' on the target (1) differs
from the number on the source (2)
Domain Migration of LDom ldom1 failed

Probable Fixes – Ensure that the underlying virtual disk devices match, if you are using mpgroups, then the entire mpgroup must match on both sides.

Missing Virtual Switch Device

# ldm migrate-domain ldom1 root@server
Target Password:
Failed to find required vsw alternate-vsw0 on target machine
Domain Migration of LDom logdom1 failed

Probable Fixes – Ensure that the underlying virtual switch devices match on both locations.

Check Migration Progress

One thing to keep in mind, is that during the migration process, the hypervisor that is being evacuated is the authoritative one in terms of controlling the process, so status should be checked there.

source# ldm list -o status ldom1
NAME
logdom1 

STATUS
 OPERATION PROGRESS TARGET
 migration 20% 172.16.24.101:logdom1

It can however be checked on the receiving end, though it will look a little bit different.

target# ldm list -o status logdom1
NAME
logdom1

STATUS
 OPERATION PROGRESS SOURCE
 migration 30% ak00176306-primary

The big thing to notice is that it shows the source on this side, also if we changed the name as part of the migration it will also show the name using the new name.

Cancel Migration

Of course if you need to cancel a migration, this would be done on the hypervisor that is being evacuated, since it is authoritative.

# ldm cancel-operation migration ldom1
Domain Migration of ldom1 has been cancelled

This will allow you to cancel any accidentally started migrations, however likely anything that you needed to cancel would generate an error before needing to do this.

Cross CPU Considerations

By default logical domains are created to use very specific CPU features based on the hardware it runs on, as such live migration only works by default on the exact same CPU type and generation.  However if we change the CPU

Native – Allows migration between same CPU type and generation.

Generic – Allows the most generic processor feature set to allow for widest live migration capabilities.

Migration Class 1 – Allows migration between T4, T5 and M5 server classes (also supports M10 depending on firmware version)

SPARC64 Class 1 – Allows migration between Fujitsu M10 servers.

Here is an example of how you would change the CPU architecture of a domain.  I personally recommend using this sparingly and building your hardware infrastructure in a way where you have the capacity on the same generation of hardware, however in certain circumstances this can make a lot of sense if the performance implications are not too great.

# ldm set-domain cpu-arch=migration-class1 ldom1

I personally wouldn’t count on the Cross-CPU functionality, however in some cases it might make sense for your situation, either way Live Migration of Logical Domains is done in a very effective manner and adds a lot of value.

2 thoughts on “SPARC Logical Domains: Live Migration

  1. Sujit Choudhury

    You have not mentioned what kind of storage you are using. I am planning the kind of set up outlined in your article. I would be grateful if you could say more about your storage. Our storage is NetApps filer.

    1. matthew.mattoon Post author

      Hi Sujit,

      Your storage vendor doesn’t matter for this. As long as the storage is visible on both locations and the vdsdev’s exist on both sides (and match) then your migration will work.

      You could do this a couple of ways NFS “pool” shared between the two servers, with disk images (disk1.img) living on it. The image file is then exposed as a vdsdev on both nodes. Then on the active node the vdsdev is handed to the ldom as a vdisk. The other way would be the FC way where FC is provisioned to both servers, with vdsdevs exposed on both. Then the vdisk is created on the active node handed to the active node.

      I use the latter, and if you have FC available that is a more performant solution than NFS.

      -matt