Wednesday, March 21, 2012

VCS Cluster Communications Guide

VCS Cluster Communications

From HP's MC/Serviceguard, cluster membership is defined in the primary cluster configuration file - simply an ASCII file that the administrator edits. Cluster communications takes place in any network path between the cluster nodes.

Veritas Cluster Server's communications is more complex. But once you have everything set up, in theory, you should never have to worry about it again. However, troubleshooting VCS cluster communications without the requisite knowledge, would be a very entertaining exercise - to watch...

This page seeks to document the VCS communications elements and how to go about updating/modifying them, as needed. The first section is an overview - how the pieces tie togeter.

Following that, we'll work our way up from the bottom - examining LLT, then GAB in order.

Overview

To begin the overview, refer to the picture:


















VCS agents track the state of all resources and service groups in the cluster. _HAD_ polls the various agents on the node and, if there's a change, reports that to _GAB_. _GAB_ (Group Membership Services/Atomic Broadcast) has two jobs.

First, it tracks which systems are part of the cluster. Cluster membership is defined by systems sharing the same cluster ID and a pair of redundant Ethernet LLT cables.

GAB's second job is to transmit resource status changes to all nodes in the cluster.

The atomic broadcast portion of the name implies (correctly, as it turns out) that all systems in the cluster are notified of any changes. If a failure occurs during the update, the "status change" is rolled back ensuring that, upon recovvery, all nodes have the same status information. It's the same paradigm as a database commit, if that's familiar to you _LLT_ is responsible for transmitting the heartbeat signals which GAB uses to maintain cluster membership.

A cluster can have between 2 and 8 LLT cables. LLT links can be identified as low or high priority.

- High priority links:
  - Send a heartbeat ever .5 seconds
  - Carry cluster status information
  - Should be configured over dedicated network links

- Low priority links:
  - Send a hearbeat every second.
  - Do not carry cluster status information.
  - Can be configured on public networks
  - Will be automtically promoted to high priority links if all other high priority links have failed.

LLT


LLT is the lowest protocol in the VCS communications chain so everything else relies on it. If LLT isn't happy, ain't nothing happening - so, let's make LLT happy.

_/etc/llttab_

_/etc/llttab_ is LLT's primary configuration file. At a minimum, it specifies the system ID, cluster ID, and the local links that LLT uses for heartbeat signals.

You can see what other options are available via the sample llttab file in _/opt/VRTSllt/llttab_ and in the llttab man page. Since the file defines host specific entries, it must be unique to the host. No rdist'ing this file...

# cat /etc/llttab
set-node       1
set-cluster    10
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - \-

set-node

Ensure that there is only one _set-node_ directive in the file The value for the set-node can be either a number (0-31) or a system name. If you use the system name, the name must resolve to a unique number in the _/etc/llthosts_ file.

For instance:

# cat /etc/llttab
set-node       athena
set-cluster    10
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
# cat /etc/llthosts
0 athena
1 zeus
2 otehllo
3 ...

If you decide to use the _/etc/llthosts_, the following rules apply: - The file must be synced across all nodes in the cluster. - The node numbers must be unique otherwise, the cluster won't start. - The system names used must match those in the _/etc/llttab_, _etc/VRTSvcs/conf/main.cf_ and _/etc/VRTSvcs/conf/sysname_ (if used) files.

set-cluster

The set-cluster directive must specify a unique number across any clusters that the LLT heartbeat can reach. This implies that you can have multiple clusters sharing the same LLT links. This makes it easier on the networking people in that you can have four 4-node clusters sharing the same two 16-port switches thereby reducing the overall hardware costs.

_/etc/sysname_

The sysname file eliminates VCS's dependence on the UNIX uname command to identify the hostname. The problem is that some OSs will report a fully qualified domain name for a "uname -n" command. If that's the case, the system name won't match the name in the main.cf file and VCS will puke.

If you use the sysname file, ensure the system name matches the one used in the llttab and ltthosts file - assuming you're using the system name there as well. At a minimum, it must match the name in the main.cf file.

LLT commands


OK; now that we have all the LLT configuration files set up, it's time to run LLT:lltconfig -c Now that it's running, how do we verify that

# lltconfig -a list
llt is running

The "lltstat -nvv" command displays some verbose information on the status of LLT. You can use this command to verify that all LLT links are operational.

The training manual suggests executing the command via cron and looking for DOWN periodically. See the man page for additional options


GAB


GAB is the next rung up the protocol ladder; hower, it's easier to configure than is LLT h3. _/etc/gabtab_

The gabtab file is the primary (and only) means of configuring GAB. It specifies the command line used to start GAB. As such, in order to start GAB, you just run the /etc/gabtab file/etc/gabtab


GAB Status


Use "gabconfig -a" to check the status of GAB.

# gabconfig -a
GAB Port Membership
===============================================
Port a gen b38f123 membership 01        ; 2       01
Port h gen f00b123 membership 01        ; 2       01

There's quite a bit of useful information in the list above. For instance, the _Port a{_}indiates that GAB is communicating (which automatically means that LLT is fully functional) and has membership of nodes 0, 1, 12, 20, and 21.

The membership list uses ';' for 10's markers or 0 if the node is actually in use The _Port h_ indicates that had is started and has similar memberships to GAB

Manual Seeding


GAB will normally handle the seeding of the cluster nodes automatically based on the information in the _/etc/gabtab_. If, however, you have a node down for maintenance and need to restart your cluster, GAB will effectively hang waiting for the last system to come alive.

To circumvent this issue, you will need to manually seed the cluster. To do that, do the following: - On one node and one node only, execute gabconfig -c -x -

On the remaining available nodes, execute gabconfig -c

----
Reference:
[http://olearycomputers.com/ll/vcs/vcs_comms.html]

No comments: