May have seen the following process
myserver:/:>ps -aef | grep clfile
root 4063286 5570752 0 00:59:56 - 0:00 /usr/es/sbin/cluster/utilities/clfileprop -a
This process, in essense,
- belongs to powerHA.
- runs every 10 minutes on my servers. Think this is default.
- propagate changes of configuration files to all other nodes.
- caveat is, if you run manually on node A, this node A will propagate files from node A to other nodes regardless of last modified date. e.g. if you run verification, etc.
Monday, November 19, 2012
What does clfileprop process do
Wednesday, October 17, 2012
Good reference link of powerHA
This is a bookmark of powerHA links i find it useful. Keep them here just in case. :)
http://aix4admins.blogspot.sg/2011/10/commands.html
Friday, August 10, 2012
How to extract PowerHA configuration from ODM for quick recovery
In AIX, ODM holds a lot of information and configuration.
In the event that ODM goes kaput, all hell will break loose. Especially for powerHA, having a backup of the configuration will aid in the recovery of powerHA configuration issues.
As i'm still learning to use AIX and powerHA, do let me know if my method is good enought. :)
I have written the script to extract powerHA configurations.#/bin/ksh
#
# Script Name : spool_HA_config.sh
# Written : 08 Aug 2012
# Author : Victor Kwan At gmail
#
# Description : This is to spool the powerHA 7 configuration on a
# AIX 7.1 machine.
# This script should be cron to run regularly for
# quick recovery if powerHA configuration gets corrupted
# in AIX ODM.
#
# Updates : 08 Aug 2012 : First version
# : 10 Aug 2012 : spooled files now uses DDMMYYYY_HHmmSS format
#
# Declarations
#
DATE=`date +'%d%m%Y_%H%M%S'`
# Safety Measure
#
WHO=`/usr/bin/whoami`
if [ ${WHO} != root ]
then
echo "You shouldn't be running this using ${WHO}! Script will now terminate."
fi
#
# Spool the HA configuration from ODM
/usr/es/sbin/cluster/utilities/clsnapshot -c -i -n HA_snap_`hostname`_${DATE} -d "HA snapshot on ${DATE}" >/dev/null 2>&1
# Ends
# ~
The main star in this script is the clsnapshot command. By default, the output of clsnapshot command will be saved at /usr/es/sbin/cluster/snapshots.
Below is the sample of the files spooled. There are 2 files, one *.odm and one *.info. I think both are required to be imported to powerHA if we need to recover from configuration issues.-rw-r--r-- 1 root system 57482 Aug 10 01:00 HA_snap_serverA_10082012_010000.odm
-rw-r--r-- 1 root system 86579 Aug 10 01:00 HA_snap_serverA_10082012_010000.info
Of course, there are many things we need to keep watch on, we wouldn't want to run this script manually. Hence, put it in root cron to be run daily.# PowerHA configuration daily spool
0 1 * * * /myscript_folder/spool_HA_config.sh >/myscript_folder/spool_HA_config.output 2>&1
and we are done.
Monday, July 02, 2012
How to resolve LVM error in powerHA
In the event you run into the following error:
cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command.
BUT if it continues to be a problem, then stop powerHA 7.1 on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.
Friday, June 29, 2012
AIX powerHA auto-verification
powerHA 7.1 will automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:
# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1
If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.
With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:
# smitty clautover.dialog
[Entry Fields] * Automatic cluster configuration verification Enabled + * Node name Default + * HOUR (00 - 23) [00] +# Debug yes
You can check with:
# odmget HACMPcluster # odmget HACMPtimersvc
Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.
Source Reference (If the original author do not agree that i post on my blog, please let me know. :) )
http://www.aixhealthcheck.com/blog.php?id=116Thursday, June 28, 2012
How to check powerHA settings and events from ODM
In ODM, the object class definition that will be displayed when you query a particular object class. For example, here i queried the HACMPevent object.
root@myserver:/> odmshow HACMPevent class HACMPevent { char name[256]; /* offset: 0xc ( 12) */ char desc[256]; /* offset: 0x10c ( 268) */ short setno; /* offset: 0x20c ( 524) */ short msgno; /* offset: 0x20e ( 526) */ char catalog[256]; /* offset: 0x210 ( 528) */ char cmd[1024]; /* offset: 0x310 ( 784) */ char notify[1024]; /* offset: 0x710 ( 1808) */ char pre[1024]; /* offset: 0xb10 ( 2832) */ char post[1024]; /* offset: 0xf10 ( 3856) */ char recv[1024]; /* offset: 0x1310 ( 4880) */ short count; /* offset: 0x1710 ( 5904) */ long event_duration; /* offset: 0x1714 ( 5908) */ }; /* descriptors: 12 structure size: 0x1718 (5912) bytes data offset: 0x380 population: 89 objects (89 active, 0 deleted) */
i.e. what script to run when a node is attempting to join a cluster.
root@myserver:/> odmget -q name=node_up HACMPevent HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = "" pre = "" post = "" recv = "" count = 0 event_duration = 0
and the powerHA 7.1 events from ODM database,
root@myserver:/> odmget HACMPevent | awk '/name/ {print $3}' | sed 's/"//g'
swap_adapter
swap_adapter_complete
network_up
network_down
network_up_complete
network_down_complete
node_up
node_down
node_up_complete
node_down_complete
join_standby
fail_standby
acquire_service_addr
acquire_takeover_addr
get_disk_vg_fs
node_down_local
node_down_local_complete
node_down_remote
node_down_remote_complete
node_up_local
node_up_local_complete
node_up_remote
node_up_remote_complete
release_service_addr
release_takeover_addr
release_vg_fs
start_server
stop_server
config_too_long
event_error
reconfig_topology_start
reconfig_topology_complete
reconfig_resource_release
reconfig_resource_release_primary
reconfig_resource_release_secondary
reconfig_resource_acquire_secondary
reconfig_resource_complete_secondary
reconfig_resource_release_fence
reconfig_resource_acquire_fence
reconfig_resource_acquire
reconfig_resource_complete
migrate
migrate_complete
acquire_aconn_service
swap_aconn_protocols
get_aconn_rs
release_aconn_rs
server_restart
server_restart_complete
server_down
server_down_complete
rg_move
rg_move_release
rg_move_acquire
rg_move_fence
rg_move_complete
site_down
site_down_complete
site_down_local
site_down_local_complete
site_down_remote
site_down_remote_complete
site_up
site_up_complete
site_up_local
site_up_local_complete
site_up_remote
site_up_remote_complete
site_merge
site_merge_complete
site_isolation
site_isolation_complete
fail_interface
join_interface
cluster_notify
resource_add
resource_modify
resource_delete
resource_online
resource_offline
resource_state_change
resource_state_change_complete
external_resource_state_change
external_resource_state_change_complete
intersite_fallover_prevented
reconfig_configuration_complete
forced_down_too_long
start_udresource
stop_udresource
Monday, June 25, 2012
How to mount logical volume from another node in the cluster
Stop using the logical volume in the first node.
# umount
# varyoffvg
Import the disk at node 2
# importvg -L
# varyonvg
Pls Note
Restrictions:
* The volume group must not be in an active state on the system executing the -L flag.
* The volume group's disks must be unlocked on all systems that have the volume group varied on and operational. Volume groups and their disks may be unlocked, remain active and used via the varyonvg -b -u command.
* The physical volume name provided must be of a good and known state, the disk named may not be in the missing or removed state.
* If a logical volume name clash is detected, the command will fail. Unlike the basic importvg actions, clashing logical volume names will not be renamed.
Extra Info:
h4. Now the LVs can be mounted and use.
# mount
If you have configured powerHA, this should be taken care of and transparent when doing a switch over. The steps above is the crude and manual way of doing what powerHA can do.