United States (change)
Shortcuts: Downloads Fedora Red Hat Network
Account Links: Cart Your Account Logout
The cluster node name needs to match the output of uname -n or the value of HOSTNAME in /etc/sysconfig/network. The cluster node names need to be the fully qualified domain name.
The default interface that heartbeat and communication traffic goes out is eth0. Heartbeat and ethernet traffic need to go out another interface then follow this link:
http://sources.redhat.com/cluster/faq.html#cman_heartbeat_nic
With cluster communication going over eth0,the ip and hostname will need to be added to the /etc/hosts file. Add all nodes in the cluster.conf to the /etc/hosts file. DNS is not reliable enough for clustering.
To start:
$service ccsd start $service cman start $service fenced start $service clvmd start $service gfs start $service rgmanager start
To stop:
$service rgmanager stop $service gfs stop $service clvmd stop $service fenced stop $service cman stop $service ccsd stop
A cluster "service" is made up of cluster resources, components that can be failed over from one node to another, such as an IP address, an application initialization script, or a Red Hat GFS shared partition.
Cluster services can be connected with a "failover domain", a subset of cluster nodes that are eligible to run a particular cluster service.
A failover domain is a collection of services that certain nodes can run. A service is a collection of resources that the service will start or stop.
Failover domains are not needed for each service. A failover domain can be associated with more than one service.
The file /etc/hosts should contain all cluster nodes names (which will be defined in the cluster.conf) and and all the fence device names (which will be defined in the cluster.conf). All nodes and all fence devices in your cluster should be in here. The /etc/hosts file should look very similar on all the nodes.
#update all packages on all nodes $update -uf $up2date --show-channels
Install all the packages for cluster-suite. This will install some extra kernels but after it boots up you can then uninstall them.
$up2date --installall= channel name for cluster-suite
Or just install the ones that are wanted:
Standard kernel $up2date cman cman-kernel dlm dlm-kernel magma magma-plugins system-config-cluster rgmanager ccs fence SMP kernel $up2date cman cman-kernel-smp dlm dlm-kernel-smp magma magma-plugins system-config-cluster rgmanager ccs fence
Run this command, and add one node to the cluster.
$system-config-cluster
Then, change the name of the cluster from default name and increase the version number.
$vi /etc/cluster/cluster.conf <cluster config_version="1" name="alpha_cluster"> change to <cluster config_version="2" name="my_cluster">
$mkdir /etc/cluster
Then, from the node where the cluster.conf file was created, send it to the other nodes as follows:
$scp /etc/cluster/cluster.conf root@nodeX:/etc/cluster/cluster.conf
Verify that /etc/hosts has all nodes name and fence device names defined in /etc/cluster/cluster.conf and then propagate to all nodes as follows:
$scp /etc/hosts root@nodeX:/etc/hosts
Run each of the following commands on all nodes before proceeding to the next command. Starting a couple of ssh sessions on each node will be useful. (If GFS is not installed, clvmd, GFS will not need to be started or stopped. If services are not installed, then rgmanager will not need to be started or stopped.)
To start:
$service ccsd start $service cman start $service fenced start $service clvmd start $service gfs start $service rgmanager start
To stop:
$service rgmanager stop $service gfs stop $service clvmd stop $service fenced stop $service cman stop $service ccsd stop
Once all nodes have all the services started, check if each node has correct info and that they report that they can see each other as follows:
$cman_tool nodes
First create a fail-over domain with the nodes.
Create a service for virtual IP address. Make sure it is not set to auto-start in the left hand corner. Make sure to add the fail-over domain to the one that was created in the upper right hand corner. Save file and propagate the file, then run this command to start rgmanager: service rgmanager start.
To enable a service $clusvcadm -e servicename To stop and disable a service $clusvcadm -d servicename
To check if it working do these two tests:
$ip addr $ping X.X.X.X
Once that works, move to next step.
The name of the fence device is defined is in your cluster.conf file. The manual page is man fence_nameoffencedevice
**This is real world example for testing fail-over. Make sure that nothing mission critical is running on that node.**
Start the virtual IP service on a node, then manually crash that node. To setup the node so it can be sent a test panic that will crash the node, do the following.
Turn sysrq on at boot time. Open /etc/sysctl.conf and add or change the line kernel.sysrq = 1. This turns sysrq on.
If it is not desired to have sysrq on, then the following command will turn it on until the system is rebooted:
$echo 1 > /proc/sys/kernel/sysrq
Next, start the service on this node once sysrq is enabled by issuing this command:
$clusvcadm -e servicename
Now trigger the panic with the command echo 'c' > /proc/sysrq-trigger.
If the fencing worked, the node should be rebooted and should be reported as offline. It may take a minute or two to get status and fence the node.
Watch for missed heartbeats by opening a new terminal, and issuing the command tail -f /var/log/messages.
Run this command once it is fenced and it should show the fenced node as offline: cman_tool nodes.
Once it is offline the service should be on another node. Use the command ip addr to see this.
Machine should panic and then be rebooted. If fail-over works and IP fails to a different machine, then clustering is working. Now start adding the services that are needed on the cluster.
Cluster sources Faq, lots of information
Log cluster logs to a different file
How do I install the Cluster Suite
packages in RHEL 4 via RHN?