United States (change)
Shortcuts: Downloads Fedora Red Hat Network
Release Found: Red Hat Enterprise Linux 5
When cluster nodes share storage devices, it is necessary to control access to the storage devices. In the event of a node failure, the failed node should not have access to the underlying storage devices. SCSI persistent reservations provide the capability to control the access of each node to shared storage devices. Red Hat Enterprise Linux 5 Advanced Platform employs SCSI persistent reservations as a fencing methods through the use of the fence_scsi agent. The fence_scsi agent provides a method to revoke access to shared storage devices, provided that the storage support SCSI persistent reservations.
Using SCSI reservations as a fencing method is quite different from traditional power fencing methods. It is very important to understand the software, hardware, and configuration requirements prior to using SCSI persistent reservations as a fencing method.
In order to understand how Red Hat Enterprise Linux 5 Advanced Platform is able to use SCSI persistent reservations as a fencing method, it is helpful to have some basic knowledge of SCSI persistent reservations.
There are two important concepts within SCSI persistent reservations that should be made clear: registrations and reservations.
A registration occurs when a node registers a unique key with a device. A device can have many registrations. For our purposes, each node will create a registration on each device.
A reservation dictates how a device can be accessed. In contrast to registrations, there can be only one reservation on a device at any time. The node that holds the reservation is know as the "reservation holder". The reservation defines how other nodes may access the device. For example, fence_scsi uses a "Write Exclusive, Registrants Only" reservation. This type of reservation indicates that only nodes that have registered with that device may write to the device.
The fence_scsi agent is able to perform fencing via SCSI persistent reservations by simply removing a node's registration key from all devices. When a node failure occurs, the fence_scsi agent will remove the failed node's key from all devices, thus preventing it from being able to write to those devices.
In order to use SCSI persistent reservations as a fencing methods, several requirements must be met/
The sg3_utils package must also be installed. This package provides the tools needed by the various scripts to manage SCSI persistent reservations.
In order to use SCSI persistent reservations as a fencing method, all shared storage must use LVM2 cluster volumes. In addition, all devices within these volumes must be SPC-3 compliant. SCSI-2 devices are not supported. If you are unsure if your cluster and shared storage environment meets these requirements, a script is available to determine if your shared storage devices are capable of using SCSI persistent reservations. See section 5.1.
In addition to these requirements, fencing by way of SCSI persistent reservations also some limitations.
To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations.
Red Hat Enterprise Linux 5 Advanced Platform provides three components (scripts) to be used in conjunction with SCSI persistent reservations. The fence_scsi_test script provides a means to discover and test devices and report whether or not they are capable of SCSI persistent reservations. The scsi_reserve init script, if enabled, will run at node startup and discover shared storage devices and create registrations/reservations on each device using the node's unique key. The fence_scsi script, if configured as the fencing method, will remove a failed node's registration key from all known devices.
The fence_scsi_test script will find all devices visible to a node and report whether or not those devices are compatible with SCSI persistent reservations. There are two modes of operation for this this, and the user must explicitly state which mode to use by using to appropriate command-line option.
Specified with the '-c' flag on the command-line. This mode is intended for use with an existing cluster environment. Specifically, this mode will discover all LVM2 cluster volumes and extract the devices within those volumes. In other words, only devices that exist within LVM2 cluster volumes will be tested.
Specified with the '-s' flag on the command-line. This mode is intended to test all SCSI devices visible to the node, which is useful when planning the cluster volume configuration. Note that this mode will test all SCSI devices found in the /sys/block/ directory, which may include local SCSI devices.
In both modes, the devices found will be tested for compatibility. This is done by attempting to register with the devices. Successful registration indicates that the device is capable of performing SCSI persistent reservations. If registration is successful, the script will remove the registration.
Users will want to pay close attention to which devices report failure. If fence_scsi_test is run in "cluster mode" and reports devices that have failed the test, you must not use fence_scsi as your fencing method. If fence_scsi_test was run in "SCSI mode" are reports failures for devices, those devices must not be used for shared storage (LVM2 cluster volumes) if you wish to use fence_scsi as a fencing method.
Once you have verified that your cluster storage is compatible and meets the requirements necessary to use fence_scsi, you can enable the scsi_reserve init script. This can be done with the following command:
% chkconfig scsi_reserve on
When enabled, the scsi_reserve script handles creation of registrations and reservations at system startup.
The scsi_reserve init script will first generate the node's unique key. This key is based on the cluster ID and the node ID, thus it is guaranteed to be unique. The next step in the scsi_reserve script depends on which parameter was used. The following options are allowed: start, stop, and status. Each case requires that the cluster manager (cman) be running. This is needed to extract information about the cluster and the individual node.
Running the scsi_reserve init script with the 'start' option will proceed to create registrations on all devices that were previously discovered. If necessary, it will also create the reservation. The script will report success or failure. Success indicates that the node was capable of registering with all devices that were discovered. Failure indicates that the script was unable to register with one or more device. Should a failure occur, the cluster has no way of completely fencing a node in the event of a node failure.
It is important to note that 'scsi_reserve start' should be run before mounting the file system. The reason for this is that if you already have a file system mounted and then create a reservation on any of the devices used by that file system, any node that is not registered with those devices will be unable to write to the file system.
When scsi_reserve is run with the 'stop' command, it will attempt remove the node's registration key from all devices that it registered with at startup. Removing the registration is only a problem if that node is also the reservation holder and other node's are still registered with the device(s). In this case, the node will not be able to unregister since doing so would also release the reservation. Note that the script will report failure when attempting to remove a node's registration if it is the reservation holder and other registrations exist.
When the scsi_reserve script is run with the 'status' command, it will list the devices that the node is registered with.
The fence_scsi script is the actual fence agent that is run when node failure occurs. Typically this script will not be run manually, but rather invoked by fence domain. Using this script manually will remove a node's registrations from all devices, but will not remove the node from the cluster.
When a node is fenced using fence_scsi, it simply removes the specified node's registrations from all devices. This prevents write access to those devices. In the special case where the node being fenced is also the reservation holder, the node that is performing the fence operation will become the new reservation holder.
Note that if the node being fenced has the file system mounted, removing its registrations prevents the node from accessing the file system. This sudden inability to access the devices upon which the file system exists may result in I/O errors and a subsequent withdraw from the file system. This behavior is expected.
Below is a sample configuration (cluster.conf) for a cluster that uses SCSI persistent reservations as its fence method. Note that each node defines its fence device and passes its node name to the agent via the "node" attribute.
Also note that each node explicitly defines its "nodeid". This is required for all clusters that use fence_scsi as the fence method. The "nodeid" attribute must be defined so that the various SCSI reservation scripts can predictably generate the node's unique registration key.
<?xml version="1.0"?>
<cluster config_version="1" name="my_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="30"/>
<clusternodes>
<clusternode name="node-01" votes="1" nodeid="1">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-01"/>
</method>
</fence>
</clusternode>
<clusternode name="node-02" votes="1" nodeid="2">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-02"/>
</method>
</fence>
</clusternode>
<clusternode name="node-03" votes="1" nodeid="3">
<fence>
<method name="scsi">
<device name="fence_dev" node="node-03"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman cluster_id="1234"/>
<fencedevices>
<fencedevice agent="fence_scsi" name="fence_dev"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
sg3_utils page: http://sg.danny.cz/sg/sg3_utils.html
please link from http://www.redhat.com/cluster_suite/hardware/ to this knowlegebase article or include in that document the document reference of this knowlegebase article
also please add GFS as a keyword on this document.