Phantom Websphere: powerHA

Showing posts with label powerHA. Show all posts

Monday, November 19, 2012

What does clfileprop process do

May have seen the following process

myserver:/:>ps -aef | grep clfile
root 4063286 5570752 0 00:59:56 - 0:00 /usr/es/sbin/cluster/utilities/clfileprop -a

This process, in essense,
- belongs to powerHA.
- runs every 10 minutes on my servers. Think this is default.
- propagate changes of configuration files to all other nodes.
- caveat is, if you run manually on node A, this node A will propagate files from node A to other nodes regardless of last modified date. e.g. if you run verification, etc.

Wednesday, October 17, 2012

Good reference link of powerHA

This is a bookmark of powerHA links i find it useful. Keep them here just in case. :)

http://aix4admins.blogspot.sg/2011/10/commands.html

Friday, August 10, 2012

How to extract PowerHA configuration from ODM for quick recovery

In AIX, ODM holds a lot of information and configuration.

In the event that ODM goes kaput, all hell will break loose. Especially for powerHA, having a backup of the configuration will aid in the recovery of powerHA configuration issues.

As i'm still learning to use AIX and powerHA, do let me know if my method is good enought. :)

I have written the script to extract powerHA configurations.

#/bin/ksh # # Script Name : spool_HA_config.sh # Written : 08 Aug 2012 # Author : Victor Kwan At gmail # # Description : This is to spool the powerHA 7 configuration on a # AIX 7.1 machine. # This script should be cron to run regularly for # quick recovery if powerHA configuration gets corrupted # in AIX ODM. # # Updates : 08 Aug 2012 : First version # : 10 Aug 2012 : spooled files now uses DDMMYYYY_HHmmSS format # # Declarations # DATE=`date +'%d%m%Y_%H%M%S'` # Safety Measure # WHO=`/usr/bin/whoami` if [ ${WHO} != root ] then echo "You shouldn't be running this using ${WHO}! Script will now terminate." fi # # Spool the HA configuration from ODM /usr/es/sbin/cluster/utilities/clsnapshot -c -i -n HA_snap_`hostname`_${DATE} -d "HA snapshot on ${DATE}" >/dev/null 2>&1 # Ends # ~

The main star in this script is the clsnapshot command. By default, the output of clsnapshot command will be saved at /usr/es/sbin/cluster/snapshots.

Below is the sample of the files spooled. There are 2 files, one *.odm and one *.info. I think both are required to be imported to powerHA if we need to recover from configuration issues.

-rw-r--r-- 1 root system 57482 Aug 10 01:00 HA_snap_serverA_10082012_010000.odm -rw-r--r-- 1 root system 86579 Aug 10 01:00 HA_snap_serverA_10082012_010000.info

Of course, there are many things we need to keep watch on, we wouldn't want to run this script manually. Hence, put it in root cron to be run daily.

# PowerHA configuration daily spool 0 1 * * * /myscript_folder/spool_HA_config.sh >/myscript_folder/spool_HA_config.output 2>&1

and we are done.

Monday, July 02, 2012

How to resolve LVM error in powerHA

In the event you run into the following error:

    cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.

This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:

    # varyoffvg vg

And then retry the LVM command.

BUT if it continues to be a problem, then stop powerHA 7.1 on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.

Friday, June 29, 2012

AIX powerHA auto-verification

powerHA 7.1 will automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:

# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1

If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.

With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:

    # smitty clautover.dialog

                                                        [Entry Fields]
* Automatic cluster configuration verification        Enabled                                                                                             +
* Node name                                           Default                                                                                             +
* HOUR (00 - 23)                                     [00]                                                                                                 +#
  Debug                                               yes

You can check with:

    # odmget HACMPcluster
    # odmget HACMPtimersvc

Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.

Source Reference (If the original author do not agree that i post on my blog, please let me know. :) )

http://www.aixhealthcheck.com/blog.php?id=116

Thursday, June 28, 2012

How to check powerHA settings and events from ODM

In ODM, the object class definition that will be displayed when you query a particular object class. For example, here i queried the HACMPevent object.

root@myserver:/> odmshow HACMPevent
class HACMPevent {
        char name[256];                              /* offset: 0xc ( 12) */
        char desc[256];                              /* offset: 0x10c ( 268) */
        short setno;                                 /* offset: 0x20c ( 524) */
        short msgno;                                 /* offset: 0x20e ( 526) */
        char catalog[256];                           /* offset: 0x210 ( 528) */
        char cmd[1024];                              /* offset: 0x310 ( 784) */
        char notify[1024];                           /* offset: 0x710 ( 1808) */
        char pre[1024];                              /* offset: 0xb10 ( 2832) */
        char post[1024];                             /* offset: 0xf10 ( 3856) */
        char recv[1024];                             /* offset: 0x1310 ( 4880) */
        short count;                                 /* offset: 0x1710 ( 5904) */
        long event_duration;                         /* offset: 0x1714 ( 5908) */
        };
/*
        descriptors:    12
        structure size: 0x1718 (5912) bytes
        data offset:    0x380
        population:     89 objects (89 active, 0 deleted)
*/

i.e. what script to run when a node is attempting to join a cluster.

root@myserver:/> odmget -q name=node_up HACMPevent

HACMPevent:
        name = "node_up"
        desc = "Script run when a node is attempting to join the cluster."
        setno = 101
        msgno = 7
        catalog = "events.cat"
        cmd = "/usr/es/sbin/cluster/events/node_up"
        notify = ""
        pre = ""
        post = ""
        recv = ""
        count = 0
        event_duration = 0

and the powerHA 7.1 events from ODM database,

root@myserver:/> odmget HACMPevent | awk '/name/ {print $3}' | sed 's/"//g'
swap_adapter
swap_adapter_complete
network_up
network_down
network_up_complete
network_down_complete
node_up
node_down
node_up_complete
node_down_complete
join_standby
fail_standby
acquire_service_addr
acquire_takeover_addr
get_disk_vg_fs
node_down_local
node_down_local_complete
node_down_remote
node_down_remote_complete
node_up_local
node_up_local_complete
node_up_remote
node_up_remote_complete
release_service_addr
release_takeover_addr
release_vg_fs
start_server
stop_server
config_too_long
event_error
reconfig_topology_start
reconfig_topology_complete
reconfig_resource_release
reconfig_resource_release_primary
reconfig_resource_release_secondary
reconfig_resource_acquire_secondary
reconfig_resource_complete_secondary
reconfig_resource_release_fence
reconfig_resource_acquire_fence
reconfig_resource_acquire
reconfig_resource_complete
migrate
migrate_complete
acquire_aconn_service
swap_aconn_protocols
get_aconn_rs
release_aconn_rs
server_restart
server_restart_complete
server_down
server_down_complete
rg_move
rg_move_release
rg_move_acquire
rg_move_fence
rg_move_complete
site_down
site_down_complete
site_down_local
site_down_local_complete
site_down_remote
site_down_remote_complete
site_up
site_up_complete
site_up_local
site_up_local_complete
site_up_remote
site_up_remote_complete
site_merge
site_merge_complete
site_isolation
site_isolation_complete
fail_interface
join_interface
cluster_notify
resource_add
resource_modify
resource_delete
resource_online
resource_offline
resource_state_change
resource_state_change_complete
external_resource_state_change
external_resource_state_change_complete
intersite_fallover_prevented
reconfig_configuration_complete
forced_down_too_long
start_udresource
stop_udresource

Monday, June 25, 2012

How to mount logical volume from another node in the cluster

Stop using the logical volume in the first node.

# umount
# varyoffvg

Import the disk at node 2

# importvg -L  
# varyonvg

Pls Note

The 'L' option takes a volume group and learns about possible changes performed to that volume group. Any new logical volumes created as a result of this command emulate the ownership, group identification, and permissions of the /dev special file for the volume group listed in the -y flag. The -L flag performs the functional equivalent of the -F and -n flags during execution.

Restrictions:

    * The volume group must not be in an active state on the system executing the -L flag.
    * The volume group's disks must be unlocked on all systems that have the volume group varied on and operational. Volume groups and their disks may be unlocked, remain active and used via the varyonvg -b -u command.
    * The physical volume name provided must be of a good and known state, the disk named may not be in the missing or removed state.
    * If a logical volume name clash is detected, the command will fail. Unlike the basic importvg actions, clashing logical volume names will not be renamed.

Extra Info:

The steps also assume that the Major number on both nodes are the same. Otherwise, please watch potential issue when importing the VG into the node 2. You can check for next available major number using 'lvlstmajor' command.
h4. Now the LVs can be mounted and use.

# mount

If you have configured powerHA, this should be taken care of and transparent when doing a switch over. The steps above is the crude and manual way of doing what powerHA can do.

Phantom Websphere