Today we are going to go through the process of creating a clustered file system on a pair of Oracle Linux 6.3 nodes. This exercise is not very resource intensive. I am using two VMs each with 1GB of RAM a single CPU and a shared virtual disk file in addition to the OS drivers.
The Basic Concepts
Now why is a clustered file system important? So basically if you have the need to have a shared volume between two hosts, you can provision the disk to both machines, and everything could appear to work, however in the event that writes ever happened to the same areas of the disk at the same time you will end up with data corruption. Now the key here is that you need a way to track locks from multiple nodes. This is called a Distributed Locking Manager or DLM. Now to get this DLM functionality working then it will create a cluster. Valid cluster nodes can then mount the disk and interact with it as a normal disk. So as part of OCFS2 we have two file systems which are created /sys/kernel/config and /dlm the prior is used for the cluster configurations, and the latter is for the distributed lock manager
OCFS2 has been in the mainline Linux kernel for years, so it is widely available, though if you compile your own kernels then you will need to include support in your kernel. Other than that all you need is the userland configuration tools to interact with it.
Install OCFS2 Tools
# yum install ocfs2-tools
Load and Online the O2CB Service
# service o2cb load Loading filesystem "configfs": OK Mounting configfs filesystem at /sys/kernel/config: OK Loading stack plugin "o2cb": OK Loading filesystem "ocfs2_dlmfs": OK Creating directory '/dlm': OK Mounting ocfs2_dlmfs filesystem at /dlm: OK
# service o2cb online Setting cluster stack "o2cb": OK Checking O2CB cluster configuration : Failed
Notice that when we online o2cb, that it fails at checking the O2CB cluster configuration. This is expected. It is due to not having a cluster configuration to check at this point.
Create the OCFS2 Cluster Configuration
Now we need to create the /etc/ocfs2/cluster.conf. This can be done with o2cb_ctl or manually. Though it is considerably easier with o2cb_ctl.
# o2cb_ctl -C -n prdcluster -t cluster -a name=prdcluster
Here we are naming our cluster prdcluster. The cluster itself doesn’t know anything about nodes until we add them in the next step.
Add Nodes to the OCFS2 Cluster Configuration
Create an entry for each node, using the below command. We will need the IP of the nodes, the port, the cluster name we defined before and the host name of each node.
# o2cb_ctl -C -n ocfs01 -t node -a number=0 -a ip_address=172.16.88.131 -a ip_port=11111 -a cluster=prdcluster # o2cb_ctl -C -n ocfs02 -t node -a number=1 -a ip_address=172.16.88.132 -a ip_port=11111 -a cluster=prdcluster
The IP Address and Port are used for the Cluster heartbeat. The node name is used to verify a cluster member when attempting to join the cluster. The node name needs to match the systems host name.
Review the OCFS2 Cluster Configuration
Now we can take a peek at the cluster.conf which our o2cb_ctl command created.
# cat /etc/ocfs2/cluster.conf node: name = ocfs01 cluster = prdcluster number = 0 ip_address = 172.16.88.131 ip_port = 11111 node: name = ocfs02 cluster = prdcluster number = 1 ip_address = 172.16.88.132 ip_port = 11111 cluster: name = prdcluster heartbeat_mode = local node_count = 2
Configure the O2CB Service
In order to have the cluster start with the correct information we need to update the o2cb service and include the name of our cluster.
# service o2cb configure Configuring the O2CB driver. This will configure the on-boot properties of the O2CB driver. The following questions will determine whether the driver is loaded on boot. The current values will be shown in brackets (''). Hitting <ENTER> without typing an answer will keep that current value. Ctrl-C will abort. Load O2CB driver on boot (y/n) [n]: y Cluster stack backing O2CB [o2cb]: Cluster to start on boot (Enter "none" to clear) [ocfs2]: prdcluster Specify heartbeat dead threshold (>=7) : Specify network idle timeout in ms (>=5000) : Specify network keepalive delay in ms (>=1000) : Specify network reconnect delay in ms (>=2000) : Writing O2CB configuration: OK Setting cluster stack "o2cb": OK Registering O2CB cluster "prdcluster": OK Setting O2CB cluster timeouts : OK
Offline and Online the O2CB Service
To ensure that everything is working as we expect, I like to offline and online the service.
# service o2cb offline Clean userdlm domains: OK Stopping O2CB cluster prdcluster: Unregistering O2CB cluster "prdcluster": OK
We just want to watch that it is unregistering and registering the correct cluster, in this case the prdcluster.
# service o2cb online Setting cluster stack "o2cb": OK Registering O2CB cluster "prdcluster": OK Setting O2CB cluster timeouts : OK
Repeat for All Nodes
All of the above actions need to be done on all nodes in the cluster, with no variations. Once all nodes are Registering O2CB cluster “prdcluster”: OK then you can move on.
Format Our Shared Disk
This part is no different from any other format, keep in mind that once you have formatted the disk on one cluster node, it does not need to be done on the other node.
# mkfs.ocfs2 /dev/xvdb mkfs.ocfs2 1.8.0 Cluster stack: classic o2cb Label: Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg Block size: 4096 (12 bits) Cluster size: 4096 (12 bits) Volume size: 53687091200 (13107200 clusters) (13107200 blocks) Cluster groups: 407 (tail covers 11264 clusters, rest cover 32256 clusters) Extent allocator size: 8388608 (2 groups) Journal size: 268435456 Node slots: 8 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing backup superblock: 3 block(s) Formatting Journals: done Growing extent allocator: done Formatting slot map: done Formatting quota files: done Writing lost+found: done mkfs.ocfs2 successful
Mount Our OCFS2 Volume
You can either use a manual issuance of the mount command, or you can create an entry in the /etc/fstab
# mount -t ocfs2 /dev/xvdb /d01/share
# cat /etc/fstab # # /etc/fstab # Created by anaconda on Wed Feb 27 13:44:01 2013 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/vg_system-lv_root / ext4 defaults 1 1 UUID=4b397e61-7954-40e9-943f-8385e46d263d /boot ext4 defaults 1 2 /dev/mapper/vg_system-lv_swap swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/xvdb /d01/share ocfs2 defaults 1 1
Then mount our entry from the /etc/fstab.
# mount /d01/share
Mounts will need to be configured on all cluster nodes.
Check Our Mounts
Once we have mounted our devices we need to ensure that they are showing up correctly.
# mount /dev/mapper/vg_system-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/xvda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) configfs on /sys/kernel/config type configfs (rw) ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw) /dev/xvdb on /d01/share type ocfs2 (rw,_netdev,heartbeat=local)
Notice that /d01/share is mounted as ocfs2, and that it is mounted with rw, _netdev, heartbeat=local. These are the expected options (these are gathered from the previous configuration).
Check Service Status
Finally we can check the status on the o2cb service and we can see information about our cluster, heartbeat and the various other mounts that are needed to maintain the cluster (configfs, and ocfs2_dlmfs).
# service o2cb status Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster "prdcluster": Online Heartbeat dead threshold: 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Heartbeat mode: Local Checking O2CB heartbeat: Active