Account Links: Cart | Your Account | Logout

Skip to content

Red Hat Knowledgebase

Red Hat Knowledgebase Search:

Updated Within the Last:

New Solutions within the last day New Solutions within the last week New Solutions within the last month

Browse by topics:


Click to View a Topic
Red Hat Enterprise Linux > AS/ES/WS v. 4 > Issue <<  141 of 628 >>

Solution Tools:


Email a Solution Postcard Printer version Submit a comment on this answer Update notifications Request an answer Back

Article Reference

Article ID: 10700
Last update: 09-06-07
Issue:
How do I set up a Red Hat Enterprise Linux 4 cluster with two nodes?
Resolution:

Before Starting

  1. The /etc/cluster/cluster.conf uses /etc/hosts for node-name look-up.

    The cluster node name needs to match the output of uname -n or the value of HOSTNAME in /etc/sysconfig/network. The cluster node names need to be the fully qualified domain name.

    The default interface that heartbeat and communication traffic goes out is eth0. Heartbeat and ethernet traffic need to go out another interface then follow this link:

    http://sources.redhat.com/cluster/faq.html#cman_heartbeat_nic

    With cluster communication going over eth0,the ip and hostname will need to be added to the /etc/hosts file. Add all nodes in the cluster.conf to the /etc/hosts file. DNS is not reliable enough for clustering.

  2. The cluster.conf has to be the same on all nodes. All the cluster.conf files are stored in /etc/cluster.
  3. Red Hat Cluster Suite and GFS will have many weird issues if fencing is not setup. Please note, "manual fencing" is for testing purposes only. GFS and Cluster Suite need a power or fiber switch fence. 4)What services need to be started and stopped? (If GFS is not installed, clvmd, GFS will not need to be started or stopped. If services are not installed, then rgmanager will not need to be started or stopped.)

    To start:

    $service ccsd start
    $service cman start
    $service fenced start
    $service clvmd start
    $service gfs start
    $service rgmanager start
    

    To stop:

    $service rgmanager stop
    $service gfs stop
    $service clvmd stop
    $service fenced stop
    $service cman stop
    $service ccsd stop
    
  4. What do resources, services, and failover domain mean?

    A cluster "service" is made up of cluster resources, components that can be failed over from one node to another, such as an IP address, an application initialization script, or a Red Hat GFS shared partition.

    Cluster services can be connected with a "failover domain", a subset of cluster nodes that are eligible to run a particular cluster service.

    A failover domain is a collection of services that certain nodes can run. A service is a collection of resources that the service will start or stop.

    Failover domains are not needed for each service. A failover domain can be associated with more than one service.

Installing Cluster Suite

  1. Setup the environment first (this includes networking, host-names, and ping the boxes with the names that will be in cluster.conf to ensure they are resolving.)

    The file /etc/hosts should contain all cluster nodes names (which will be defined in the cluster.conf) and and all the fence device names (which will be defined in the cluster.conf). All nodes and all fence devices in your cluster should be in here. The /etc/hosts file should look very similar on all the nodes.

  2. Setup storage such as multipath, connection to SAN, etc.
  3. Make sure all nodes see the same things. For example, if it is running LVM then the device order does not have to be the same because it is talking to volume group not physical device.
  4. Install cluster-suite on all the potential nodes. the steps for doing this are outline below.
    1. Register the boxes with rhn_register.
    2. Add the Cluster Suite and GFS (if desired) channels in Red Hat Network.
    3. Update and install the Cluster Suite packages.
      #update all packages on all nodes
      $update -uf
      $up2date --show-channels
      

      Install all the packages for cluster-suite. This will install some extra kernels but after it boots up you can then uninstall them.

      $up2date --installall= channel name for cluster-suite
      

      Or just install the ones that are wanted:

      Standard kernel
      $up2date cman cman-kernel dlm dlm-kernel magma magma-plugins system-config-cluster rgmanager ccs fence
      
      SMP kernel
      $up2date cman cman-kernel-smp dlm dlm-kernel-smp magma magma-plugins system-config-cluster rgmanager ccs fence
      
    4. Reboot into the new kernel if it was updated. The cluster services should fail since there is no cluster.conf yet when the machine boots up or the service is started.

Get Cluster Suite Working

  1. Start simple! Don't do any complex setups. If one service fails over, then this shows the cluster works correctly.
  2. Setup minimal config file with just one node in it.

    Run this command, and add one node to the cluster.

    $system-config-cluster
    

    Then, change the name of the cluster from default name and increase the version number.

    $vi /etc/cluster/cluster.conf
    
    <cluster config_version="1" name="alpha_cluster">
    change to
    <cluster config_version="2" name="my_cluster">
    
  3. Now add all the other nodes and fence devices. Use system-config-cluster. Create the directory first on nodeX as follows:
    $mkdir /etc/cluster
    

    Then, from the node where the cluster.conf file was created, send it to the other nodes as follows:

    $scp /etc/cluster/cluster.conf root@nodeX:/etc/cluster/cluster.conf
    

    Verify that /etc/hosts has all nodes name and fence device names defined in /etc/cluster/cluster.conf and then propagate to all nodes as follows:

    $scp /etc/hosts root@nodeX:/etc/hosts
    
  4. Once the cluster.conf file is on all nodes and the /etc/hosts file is basically the same on all nodes, then start the clustering on all nodes.

    Run each of the following commands on all nodes before proceeding to the next command. Starting a couple of ssh sessions on each node will be useful. (If GFS is not installed, clvmd, GFS will not need to be started or stopped. If services are not installed, then rgmanager will not need to be started or stopped.)

    To start:

    $service ccsd start
    $service cman start
    $service fenced start
    $service clvmd start
    $service gfs start
    $service rgmanager start
    

    To stop:

    $service rgmanager stop
    $service gfs stop
    $service clvmd stop
    $service fenced stop
    $service cman stop
    $service ccsd stop
    

    Once all nodes have all the services started, check if each node has correct info and that they report that they can see each other as follows:

    $cman_tool nodes
    
  5. Next, configure a service for testing. Create a simple service for testing fail-over with system-config-cluster.

    First create a fail-over domain with the nodes.

    Create a service for virtual IP address. Make sure it is not set to auto-start in the left hand corner. Make sure to add the fail-over domain to the one that was created in the upper right hand corner. Save file and propagate the file, then run this command to start rgmanager: service rgmanager start.

  6. Next, test service. Start and stop the service on each node with this command:
    To enable a service
    $clusvcadm -e servicename
    
    To stop and disable a service
    $clusvcadm -d servicename
    

    To check if it working do these two tests:

    $ip addr
    $ping X.X.X.X
    

    Once that works, move to next step.

  7. Test fail-over now that fencing is enabled. There are two ways to test:
    1. Use manual fencing with the fence script located in /sbin/fence_*.

      The name of the fence device is defined is in your cluster.conf file. The manual page is man fence_nameoffencedevice

    2. Send a test panic to a node that is running service.

      **This is real world example for testing fail-over. Make sure that nothing mission critical is running on that node.**

      Start the virtual IP service on a node, then manually crash that node. To setup the node so it can be sent a test panic that will crash the node, do the following.

      Turn sysrq on at boot time. Open /etc/sysctl.conf and add or change the line kernel.sysrq = 1. This turns sysrq on.

      If it is not desired to have sysrq on, then the following command will turn it on until the system is rebooted:

      $echo 1 > /proc/sys/kernel/sysrq
      

      Next, start the service on this node once sysrq is enabled by issuing this command:

      $clusvcadm -e servicename
      

      Now trigger the panic with the command echo 'c' > /proc/sysrq-trigger.

      If the fencing worked, the node should be rebooted and should be reported as offline. It may take a minute or two to get status and fence the node.

      Watch for missed heartbeats by opening a new terminal, and issuing the command tail -f /var/log/messages.

      Run this command once it is fenced and it should show the fenced node as offline: cman_tool nodes.

      Once it is offline the service should be on another node. Use the command ip addr to see this.

      Machine should panic and then be rebooted. If fail-over works and IP fails to a different machine, then clustering is working. Now start adding the services that are needed on the cluster.

    Links to Documentation

    Cluster/gfs overview

    Cluster

    Cluster sources Faq, lots of information

    Log cluster logs to a different file

    How do I install the Cluster Suite

    packages in RHEL 4 via RHN?


How well did this entry answer your question?


good wrong incomplete out of date
Red Hat Enterprise Linux > AS/ES/WS v. 4 > Issue <<   141  of  628  >>