RLM and SLM are both implemented by the LOCK_GULM system.
LOCK_GULM is based on a central server daemon that manages lock and cluster state for all GFS/LOCK_GULM file systems in the cluster. In the case of RLM, multiple servers can be run redundantly on multiple nodes. If the master server fails, another "hot-standby" server takes over.
The LOCK_GULM server daemon is called lock_gulmd. The kernel module for GFS nodes using LOCK_GULM is called lock_gulm.o. The lock protocol (LockProto) as specified when creating a GFS/LOCK_GULM file system is called lock_gulm (lower case, with no .o extension).
Note | |
---|---|
You can use the lock_gulmd init.d script included with GFS to automate starting and stopping lock_gulmd. For more information about GFS init.d scripts, refer to Chapter 12 Using GFS init.d Scripts. |
The nodes selected to run the lock_gulmd server are specified in the cluster.ccs configuration file (cluster.ccs:cluster/lock_gulm/servers). Refer to Section 6.5 Creating the cluster.ccs File.
For optimal performance, lock_gulmd servers should be run on dedicated nodes; however, they can also be run on nodes using GFS. All nodes, including those only running lock_gulmd, must be listed in the nodes.ccs configuration file (nodes.ccs:nodes).
You can use just one lock_gulmd server; however, if it fails, the entire cluster that depends on it must be reset. For that reason, you can run multiple instances of the lock_gulmd server daemon on multiple nodes for redundancy. The redundant servers allow the cluster to continue running if the master lock_gulmd server fails.
Over half of the lock_gulmd servers on the nodes listed in the cluster.ccs file (cluster.ccs:cluster/lock_gulm/servers) must be operating to process locking requests from GFS nodes. That quorum requirement is necessary to prevent split groups of servers from forming independent clusters — which would lead to file system corruption.
For example, if there are three lock_gulmd servers listed in the cluster.ccs configuration file, two of those three lock_gulmd servers (a quorum) must be running for the cluster to operate.
A lock_gulmd server can rejoin existing servers if it fails and is restarted.
When running redundant lock_gulmd servers, the minimum number of nodes required is three; the maximum number of nodes is five.
If no lock_gulmd servers are running in the cluster, take caution before restarting them — you must verify that no GFS nodes are hung from a previous instance of the cluster. If there are hung GFS nodes, reset them before starting lock_gulmd servers. Resetting the hung GFS nodes before starting lock_gulmd servers prevents file system corruption. Also, be sure that all nodes running lock_gulmd can communicate over the network; that is, there is no network partition.
The lock_gulmd server is started with no command line options.
Cluster state is managed in the lock_gulmd server. When GFS nodes or server nodes fail, the lock_gulmd server initiates a fence operation for each failed node and waits for the fence to complete before proceeding with recovery.
The master lock_gulmd server fences failed nodes by calling the fence_node command with the name of the failed node. That command looks up fencing configuration in CCS to carry out the fence operation.
When using RLM, you need to use a fencing method that shuts down and reboots a node. With RLM you cannot use any method that does not reboot the node.
Before shutting down a node running a LOCK_GULM server, lock_gulmd should be terminated using the gulm_tool command. If lock_gulmd is not properly stopped, the LOCK_GULM server may be fenced by the remaining LOCK_GULM servers.
Caution | |
---|---|
Shutting down one of multiple redundant LOCK_GULM servers may result in suspension of cluster operation if the remaining number of servers is half or less of the total number of servers listed in the cluster.ccs file (cluster.ccs:lock_gulm/servers). |
gulm_tool shutdown IPAddress |
Specifies the IP address or hostname of the node running the instance of lock_gulmd to be terminated.