# du -sk /opt/* | sort +0nr | head -5
25456556 /opt/data
12634192 /opt/read
5483564 /opt/download
196104 /opt/scripts
132964 /opt/freeware
tip.. if you want to include the hidden files and directories, try this.
# du -sk .[a-z]* * | sort +0nr | head -5
Tuesday, May 28, 2013
How to list the top 5 largest directories
Wednesday, May 22, 2013
How to un-mount a volume forcefully
If you ever want to unmount a volume forcefully when the system does not allow, you need to find out who is holding on to the resource, terminate it and then unmount. This is very important to prevent data loss.
Who is holding on to the volume
# lsof | grep "/opt/download"
# fuser -cu /opt/data/System.log
Now terminate them
# kill -9
The PID will be determined from the fuser or lsof command above.
Now unmount the volume peacefully
# umount /opt/download
If you disregard who's active and want to unmount right away,
# fuser -km /opt/download
In case, you want to unmount a NFS volume that is unreachable, try this.
# umount -f /opt/download
Done...
Tuesday, May 21, 2013
How to retrieve the list of failed logins in AIX
short and sharp.
# /usr/sbin/acct/fwtmp < /etc/security/failedlogin | more
Wednesday, April 17, 2013
AIX su restriction using sugroup
Thursday, October 18, 2012
My understanding of RBAC in AIX
What is RBAC?
It stand for Role Based Access Control.There is major differences between RBAC in AIX 5.3 and older AND RBAC in AIX 6.1/7.1. No value in discussing older RBAC. Will explain for "enhanced RBAC" instead.
Three primary rules are defined for RBAC:
- Role assignment: A subject can exercise a permission only if the subject has selected or been assigned a role.
- Role authorization: A subject's active role must be authorized for the subject.
With rule 1 above, this rule ensures that users can take on only roles for which they are authorized. - Permission authorization: A subject can exercise a permission only
if the permission is authorized for the subject's active role.
With rules 1 and 2, this rule ensures that users can exercise only permissions for which they are authorized.
The traditional DAC
Traditional access control, as we call it DAC (discretional access control) has been used for ages and taken for granted. The familiar string r-x------ is fundamental for all sys admin. DAC provides SUID, GUID, etc but the control scope deals with All, GROUP or OWNER access.AIX RBAC
RBAC provides precise access control such that the target role can only be assumed by a particular user. The range of commands the role can access could be a subset of all the commands that a root or any other account actually have.Difference from SUDO
SUDO is another means to control the access to privilege commands. However, it can be tedious to configure each and every commands that you want to allow an account to access.
Difference from Solaris RBAC
In essence, both Solaris RBAC and AIX RBAC are similar. The main difference is the way to implement it.In Solaris, we use mainly the following files to setup RBAC.
root:/ #ls -l /etc/user_attr /etc/security/exec_attr /etc/security/prof_attr /etc/security/auth_attr -rw-r--r-- 1 root sys 11855 Mar 28 2012 /etc/security/auth_attr -rw-r--r-- 1 root sys 20934 Aug 15 11:40 /etc/security/exec_attr -rw-r--r-- 1 root sys 8433 Aug 15 11:42 /etc/security/prof_attr -rw-r--r-- 1 root sys 1292 Aug 30 12:01 /etc/user_attr
Authorisation file for Solaris.
root:/ #tail -3 /etc/security/auth_attr
solaris.system.:::Machine Administration::help=SysHeader.html
solaris.system.date:::Set Date & Time::help=SysDate.html
solaris.system.shutdown:::Shutdown the System::help=SysShutdown.html
In AIX, this authorisation list is kept in a DB. You can create custom ones, especially for those not already in the DB. AIX don't provide the help HTML file. In reality, do we use them?
server:/: lsauth ALL | tail -3 wpar.mobility.appli id=10014 wpar.mobility.appli.other id=10016 wpar.mobility.appli.owner id=10015
The Solaris file that manage the effective privilege level to execute the command
root:/ #tail -3 /etc/security/exec_attr Zone Management:solaris:cmd:::/usr/sbin/zoneadm:uid=0 Zone Management:solaris:cmd:::/usr/sbin/zonecfg:uid=0 DisasterRecovery Admin:suser:cmd:::/opt/sysadmin/Portnet_DR_Scripts/*:uid=root
Next, not much meaning in this profile file but only to maintain the profile and description of the role
root:/: #tail -5 /etc/security/prof_attr
ZFS Storage Management:::Create and Manage ZFS Storage Pools:help=RtZFSStorageMngmnt.html
Zone Management:::Zones Virtual Application Environment Administration:help=RtZoneMngmnt.html
dtwm:::Do not assign to users. Actions and commands required for the window manager (dtwm).:help=Rtdtwm.html
shutdown:::Do not assign to users. Contains actions requiring shutdown authorization.:auths=solaris.system.shutdown;help=Rtshutdown.html
DisasterRecovery Admin:::For running DisasterRecovery scripts:
In AIX, here it is though we can set much more information, like password control, access to smitty and all that.
server:/: lsrole -f appadmin appadmin: authorizations=aix.system.cluster rolelist= groups=admingrp visibility=1 screens=* dfltmsg=role to manage Application resources msgcat= auth_mode=NONE id=11
In Solaris, the file that assign who can assume the role.
root:/ #tail -3 /etc/user_attr me::::type=normal;profiles=DNS Admin you::::type=normal;profiles=DNS Admin her::::type=normal;profiles=DNS Admin
AIX keep this information in the ODM too.
server:/: lsuser -f meuser | grep role default_roles= roles=appadmin
How to setup
Say for instance, powerHA can only be accessed by root. but to allow menu control of cluster resources, we need to have a means to start/stop/restart/suspend/resume/failover the resources without using root. It is a bad security idea to allow menu to manage the cluster resources via root account.Hence, we authorise, say meuser to access powerHA administrative commands by giving it ibm.hacmp.admin authorisation. How do we do that?
Check that Enhanced RBAC is enabled.
# lsattr -El sys0 -a enhanced_RBAC
enhanced_RBAC true Enhanced RBAC Mode True
Let's create the authorisations.
/:> mkauth dfltmsg='IBM custom' ibm /:> mkauth dfltmsg='IBM custom hacmp' ibm.hacmp /:> mkauth dfltmsg='IBM custom hacmp admin' ibm.hacmp.admin
Then check out what privileges that the commands that you are using requires.
# tracepriv -ef /usr/es/sbin/cluster/utilities/clRGinfo
----------------------------------------------------------------------------- Group Name State Node ----------------------------------------------------------------------------- apps_rg ONLINE servera OFFLINE serverb 9568366: Used privileges for /usr/es/sbin/cluster/utilities/clRGinfo: PV_AU_ADMIN PV_NET_CNTL PV_NET_PORT
# tracepriv -ef /usr/es/sbin/cluster/events/utils/cl_RMupdate
...
...
...
![]() | if you need to use your own shell script, you may need to add it into the privileged command database. Allow EUID to be equal to the owner of that script. |
Now we add the commands into the privileged command database.
/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_NET_PORT,PV_NET_CNTL accessauths=ibm.hacmp.admin /usr/es/sbin/cluster/utilities/clRGinfo
/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_KER_ACCT,PV_PROC_PRIV accessauths=ibm.hacmp.admin euid=0 /usr/es/sbin/cluster/events/utils/cl_RMupdate
/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_KER_ACCT,PV_PROC_PRIV accessauths=ibm.hacmp.admin euid=0 /admin.sh /:> setsecattr -c innateprivs=PV_AU_ADMIN accessauths=ibm.hacmp.admin euid=0 /dlpar.sh
You can verify by using lssecattr.
/:> lssecattr -F -c /dlpar.sh /dlpar.sh: euid=0 accessauths=ibm.hacmp.admin innateprivs=PV_AU_ADMIN
Now, we create a role with the above authorisations.
# mkrole authorizations=ibm.hacmp.admin dfltmsg="Custom role to do admin with hacmp" appadmin
![]() | if its for automation, you may want to remove password access to the role by the following command.
chrole auth_mode=NONE appadmin |
Next, allow meuser to be able to assume the role
chuser roles=appadmin meuser
Before you try it out, you need to update the kernel for all these to take effect. As AIX kernel is RBAC aware for all the IBM system commands, without updating the kernel, any changes will not take effect.
setket
Try it out
swrole
If you are not allow to assume the role you will receive the following error. In this example, thatuser should not assume appadmin role.
server:/HAapps: su - thatuser
-bash-3.2$ swrole appadmin
swrole: 1420-052 appadmin is not a valid role for thatuser.
It is authorised to assume meuser role instead.
server:/HAapps: su - meuser -bash-3.2$ swrole appadmin bash-3.2$ /usr/es/sbin/cluster/events/utils/cl_RMupdate suspend_appmon apps apps_rg Suspend HA Monitoring for apps. 2012-10-22T15:58:03.289727 2012-10-22T15:58:03.309369 Oct 22 2012 15:58:03 cl_RMupdate: Completed request to suspend monitor(s) for application apps. Oct 22 2012 15:58:03 cl_RMupdate: The following monitor(s) are in use for application apps: apps_svr apps_dmReference: http://aixhelp.blogspot.sg/2010/12/aix6-rbac.html
Monday, October 01, 2012
My one liner to extracting lines using sed, perl or awk
While working on extracting data from large amount of files, i have compiled some commands over the years to really helps a lot.
Most of the time, we use head, tail, grep. However, these commands are good at wholesale extracting or just by some keywords. For more complex extraction, we may use sed, perl or awk instead.
Using myfile as example, myserver:/tmp/:>head -10 myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
#
# Licensed Materials - Property of IBM
#
# COPYRIGHT International Business Machines Corp. 1985,1989
# All Rights Reserved
#myserver:/tmp:>tail -2 myfile
10.1.1.123 host1
10.2.1.124 host2
myserver:/tmp/:>grep host1 myfile
10.1.1.123 host1
Say, for more complicated stuffs, like extracting 2nd line PLUS 5th to 7th line, i find it tough to code using the above commands.
h2. sed, perl or awk?
Do note that sed will transverse the entire file, hence if you have a very large file, this might take some time.
Say, we want to extract the 2nd line, we can use sed or awkmyserver:/tmp/:>sed 2p myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
...
...myserver:/tmp/:>awk 'NR==2' myfile
# This is an automatically generated prolog.
If you have try it out, you will see that for sed, the 2nd line is indeed extracted but the rest of the file is also printed out! Use the following to disable printing out the old file.myserver:/tmp/:>sed -n 2p myfile
# This is an automatically generated prolog.
Alternatively, you might want to 'delete' whatever that you don't want by using the '!d' parameter.myserver:/tmp/:>sed '2!d' myfile
# This is an automatically generated prolog.
I wouldn't want to use this method as i have difficulty converting the line to use variables. Do give me suggestions or advice if you think otherwise. I don't claim to be expert in writing scripts. :)
IMPORTANT: Note that the single quotes are required. Else '!d' will bring back the last command you have executed with the letter 'd'.
If you only want one and only line from the file, you can get awk to exit after getting that line, otherwise the awk will transverse through the whole file.
myserver:/tmp/:>awk 'NR==6 {print; exit}' myfile
# Licensed Materials - Property of IBM
If we try to extract line 5 to 7 using sed or awk
myserver:/tmp/:>sed -n 5,7p myfile
#
# Licensed Materials - Property of IBM
#myserver:/tmp/:>awk 'NR==5,NR==7' myfile
#
# Licensed Materials - Property of IBM
#
Here's another trick that i read from Mr Google. If you want to extract every 5th line of a file starting from the top of the file, perl or awk does the job easily.
myserver:/tmp/:>perl -ne 'print unless (0 != $. % 5)' myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
# /etc/hosts
#
#
...
...
myserver:/tmp/:>awk '0 == NR % 5' myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
# /etc/hosts
#
#
...
...
Tip: If you don't want to start from the top of the file, you can put (NR + 1), which means to start from line 1.
Thats all folks.
Friday, September 28, 2012
How to files in AIX from a rm -rf / command
# rm -rf ~
# echo "mkdir 'bin', 0777;" | perl
# cd /dev
# mknod urandom c 36 1
# chmod 644 random urandom
# stopsrc -s sshd
# mknod hd1 b 10 8
# mknod hd2 b 10 5
# mknod hd3 b 10 7
# mknod hd4 b 10 4
# mknod hd5 b 10 1
# mknod hd6 b 10 2
# mknod hd8 b 10 3
# mknod hd9var b 10 6
# mknod hd10opt b 10 9
# mknod hd11admin b 10 10
# chmod 660 hd1 hd10opt hd11admin hd2 hd3 hd4 hd5 hd6hd7hd8 hd9var
# mknod hd1 c 10 8
# restore -alvTf mksysb_mysever_date > /tmp/mksysb.server.txt
# restore -xvqf mksysb_mysever_date ./bosinst.data
# restore -xvqf mksysb_mysever_date ./dev
# restore -xvqf mksysb_mysever_date ./admin
# restore -xvqf mksysb_mysever_date ./.ssh
# restore -xvqf mksysb_mysever_date ./.profile
Thursday, September 27, 2012
Rotating AIX audit log
Found that audit log grow too much on my new servers.
myserver:/:>audit query | head -2
auditing on
bin processing off
The audit will record audit events like, 'su', 'passwd', file changes, cron, mail, tcpip, lvm, etc. Since audit files are kept on a separate partition for my case, risk of widespread diskspace full is still not that great.myserver:/:>df -k | grep audit
/dev/fslv00 262144 227972 84% 8 1% /auditmyserver:/:>ls -l /audit/
total 67608
-rw------- 1 root system 0 Sep 14 16:43 auditb
-rw-rw---- 1 root system 10453248 Sep 14 16:43 bin1
-rw-rw---- 1 root system 11456 May 14 10:25 bin2
drwxr-xr-x 2 root system 256 Jul 10 14:43 lost+found
-rw-r----- 1 root system 34589752 May 14 10:24 trail
Although the binsize in /etc/security/audit/config is set to 10240, which is 10240 bytes but the bin1 and bin2 files did not stay within the 10kb limit.
Also, there is a cron that 'rotate' the trail log file but it does not compress the rotated file, hence disk space is still being hogged.myserver:/:>crontab -l | grep audit
0 * * * * /etc/security/aixpert/bin/cronaudit
So, let me suggest a workaround.
For the cron script, we add in a line to gzip the rotated log file after shifting the old file.mv /audit/trail /audit/trailOneLevelBack
gzip /audit/trailOneLevelBack
For the bin1 and bin2 files, stop audit, rotate the files and start audit. # audit shutdown
# cp -p /audit/bin1 /audit/bin1.
# cp -p /audit/bin2 /audit/bin2.
# gzip /audit/bin1.
# gzip /audit/bin2.
# cp /dev/null /audit/bin1
# cp /dev/null /audit/bin2
# audit start
Be careful not to change the inode of the files. Otherwise, i read from Mr Google that audit might get 'confused' and does not write audit logs into the bin files anymore. you might then need to reboot the host for audit to recover.
Monday, August 20, 2012
Essential boot information in AIX
Here's some practical tips on boot information in AIX.
h4. uptime and when was it last rebooted.
In RHEL and Solaris, we can only find the uptime and when was it last rebooted.
# uptime
10:06AM up 19:09, 1 user, load average: 0.35, 0.64, 0.65
# who -b
. system boot Aug 16 14:58
In AIX, we have this additional command to find history of reboot records. Power of ODM.
# last reboot
reboot ~ Aug 16 14:58
reboot ~ Aug 16 14:42
reboot ~ Aug 15 13:59
reboot ~ Aug 15 10:44
reboot ~ Aug 14 15:14
reboot ~ Jul 10 16:25
reboot ~ May 25 12:14
reboot ~ May 10 16:22
reboot ~ May 07 17:02
reboot ~ May 02 16:24
reboot ~ May 02 15:58
reboot ~ Apr 30 16:41
reboot ~ Apr 25 15:19
reboot ~ Apr 24 16:15
reboot ~ Apr 24 15:35
wtmp begins Apr 24 15:35
h4. State of the boot record
Here, we can spool and find out which disk has boot records for you to boot from.
# ipl_varyon -i
[S 8257680 9306164 08/17/12-10:07:37:395 ipl_varyon.c 1270] ipl_varyon -i
PVNAME BOOT DEVICE PVID VOLUME GROUP ID
hdisk0 YES 00f72ff5025fdaf30000000000000000 00f72ff500004c00
hdisk1 YES 00f72ff5025fdb3c0000000000000000 00f72ff500004c00
hdisk2 NO 00f72ff32bcd79f10000000000000000 00f72ff500004c00
hdisk3 NO 00f72ff32b5332f20000000000000000 00f72ff300004c00
hdisk4 NO 00f72ff32b5334930000000000000000 00f72ff300004c00
hdisk5 NO 00f72ff32b5336370000000000000000 00f72ff300004c00
[E 8257680 0:274 ipl_varyon.c 1410] ipl_varyon: exited with rc=0
h4. Creation of boot record
In the firmware (SMS), we can set the boot devices, e.g. disk, cd-rom, network. For disk, we would also need to create the boot record so that the server know HOW to load up AIX.
we create the boot record like this.
# bosboot -ad /dev/hdisk1
If you want to remove the boot record, you can try the following.
# chpv -c hdisk1
h4. Creation of boot list
Here, we create the boot list so that the server know WHERE to load up AIX.
Below, we see that there are 2 devices that we can boot up from. And it corresponds to the above ipl_varyon command. You can compare with what is set in SMS, they should match.
# bootlist -m normal -ov
'ibm,max-boot-devices' = 0x5
NVRAM variable: (boot-device=/pci@800000020000101/pci1014,0339@0/sas/disk@40600:2 /pci@800000020000101/pci1014,0339@0/sas/disk@40700:2)
Path name: (/pci@800000020000101/pci1014,0339@0/sas/disk@40600:2)
match_specific_info: ut=disk/sas/scsd
hdisk0 blv=hd5 pathid=0
Path name: (/pci@800000020000101/pci1014,0339@0/sas/disk@40700:2)
match_specific_info: ut=disk/sas/scsd
hdisk1 blv=hd5 pathid=0
If you need just the list of what we can boot up from, just drop the 'v' to reduce verbosity.
# bootlist -m normal -o
hdisk0 blv=hd5 pathid=0
hdisk1 blv=hd5 pathid=0
How to create or add the boot list, you may ask. Here's the command and example to create for the above.
# bootlist -m normal hdisk0 hdisk1
If you need to do a network boot all the time, you can set like the following.
bootlist -m normal en0 bserver=10.106.101.1 gateway=10.106.101.250 client=10.106.101.5
where
'bs' means boot server
'client' means the machine what we reboot
h4. Where is the boot image?
The boot image is usually found in hd5 (Boot PV). Below example shows that the server is booted from hd5 in hdisk0.
# bootinfo -v
hd5
# bootinfo -b
hdisk0
Attention: Never reboot the system when you suspect the boot image is corrupted.
h4. Recreation of boot image (Boot LV).
In the event you suspect the boot image is corrupted, you can recreate it using the following.
# bosboot -a -d /dev/hdisk0
** If the command fails and you receive the following message:
0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.
Try to resolve the problem using one of the following options, and then run the bosboot command again until you have successfully created a boot image:
Delete the default boot logical volume (hd5) and then create a new hd5.
Or
Run diagnostics on the hard disk. Repair or replace, as necessary.{info}
* If the bosboot command continues to fail, contact your customer support representative.
* Attention: If the bosboot command fails while creating a boot image, do not reboot your machine.
* When the bosboot command is successful, reboot your system to confirm.
Once done, update 'mini-ODM' in boot LV.
# savebase -v
Quote from Reference:
The bootrec (also known as bootstrap) is read by a special part of the firmware called System ROS (- the Read Only Storage is responsibe for the initial preparation of the machine -), and it (bootrec) tells the ROS that it needs to jump X bytes into the disk platter, to read the boot logical volume, hd5.
During reading the blv, there is a mini-ODM read into the RAM. (Later, when the real rootvg fs comes online, AIX merges the data in mini-ODM with the real ODM held in /etc/objrepos.)
When an LVM commands changes the mini-ODM, the command 'savebase' needed to run as well. Savebase takes a snapshot of the ODM and compresses it
h4. How all these gel together. (My understanding and value adding just in case, people complain i copy too much) :P
After powering on, the server will POST.
Then it will use the boot list to find which disk, cdrom, network to load the Boot LV (inside hd5) from. Boot LV contains the AIX kernel, rc.boot file and commands required during the boot process and the mini-ODM.
next, kernel will take over the boot process.
The kernel will then loads up the file system before executing the init process (from Boot LV) which will execute the rc.boot. The rootvg will be activated and then init process from the disk will be executed to replace the init process from Boot LV and become PID 1.
The kernel will move through the rc states and get the system ready.
Friday, August 17, 2012
How to create empty file with a fixed size
In Linux, we use mktemp or mkfile.
In Solaris, we use touch.
In AIX, we try the following
# /usr/sbin/lmktemp filename filesize
e.g.
# /:> lmktemp Log 104857600
Log
# /:>chown myusr:mygrp Log
# /:>ls -ltr
total 204808
-rw-r--r-- 1 myusr mygrp 104857600 Aug 15 17:12 Log
# /:>file Log
Log: commands text
Thats all folks!
Friday, August 10, 2012
How to extract PowerHA configuration from ODM for quick recovery
In AIX, ODM holds a lot of information and configuration.
In the event that ODM goes kaput, all hell will break loose. Especially for powerHA, having a backup of the configuration will aid in the recovery of powerHA configuration issues.
As i'm still learning to use AIX and powerHA, do let me know if my method is good enought. :)
I have written the script to extract powerHA configurations.#/bin/ksh
#
# Script Name : spool_HA_config.sh
# Written : 08 Aug 2012
# Author : Victor Kwan At gmail
#
# Description : This is to spool the powerHA 7 configuration on a
# AIX 7.1 machine.
# This script should be cron to run regularly for
# quick recovery if powerHA configuration gets corrupted
# in AIX ODM.
#
# Updates : 08 Aug 2012 : First version
# : 10 Aug 2012 : spooled files now uses DDMMYYYY_HHmmSS format
#
# Declarations
#
DATE=`date +'%d%m%Y_%H%M%S'`
# Safety Measure
#
WHO=`/usr/bin/whoami`
if [ ${WHO} != root ]
then
echo "You shouldn't be running this using ${WHO}! Script will now terminate."
fi
#
# Spool the HA configuration from ODM
/usr/es/sbin/cluster/utilities/clsnapshot -c -i -n HA_snap_`hostname`_${DATE} -d "HA snapshot on ${DATE}" >/dev/null 2>&1
# Ends
# ~
The main star in this script is the clsnapshot command. By default, the output of clsnapshot command will be saved at /usr/es/sbin/cluster/snapshots.
Below is the sample of the files spooled. There are 2 files, one *.odm and one *.info. I think both are required to be imported to powerHA if we need to recover from configuration issues.-rw-r--r-- 1 root system 57482 Aug 10 01:00 HA_snap_serverA_10082012_010000.odm
-rw-r--r-- 1 root system 86579 Aug 10 01:00 HA_snap_serverA_10082012_010000.info
Of course, there are many things we need to keep watch on, we wouldn't want to run this script manually. Hence, put it in root cron to be run daily.# PowerHA configuration daily spool
0 1 * * * /myscript_folder/spool_HA_config.sh >/myscript_folder/spool_HA_config.output 2>&1
and we are done.
Monday, July 02, 2012
How to resolve LVM error in powerHA
In the event you run into the following error:
cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command.
BUT if it continues to be a problem, then stop powerHA 7.1 on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.
Saturday, June 30, 2012
Installing IBM Websphere Applicationi Server (WAS) 6 Plugin on Apache 2.2.22
Friday, June 29, 2012
AIX powerHA auto-verification
powerHA 7.1 will automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:
# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1
If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.
With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:
# smitty clautover.dialog
[Entry Fields] * Automatic cluster configuration verification Enabled + * Node name Default + * HOUR (00 - 23) [00] +# Debug yes
You can check with:
# odmget HACMPcluster # odmget HACMPtimersvc
Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.
Source Reference (If the original author do not agree that i post on my blog, please let me know. :) )
http://www.aixhealthcheck.com/blog.php?id=116Thursday, June 28, 2012
How to list Network statistics
This is actually useful across all platforms like Solaris, AIX, Linux and etc to list out the network statistics of the NICs. We can tell if there are any potential network issues and spawn off necessary actions.
root@myserver:/> netstat -i Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll en2 1500 link#2 d2.48.a8.b8.c9.2 13862 0 14038 0 0 en2 1500 10.10.10 myserver 13862 0 14038 0 0 lo0 16896 link#1 10644 0 10644 0 0 lo0 16896 127 loopback 10644 0 10644 0 0 lo0 16896 loopback 10644 0 10644 0 0
How to check powerHA settings and events from ODM
In ODM, the object class definition that will be displayed when you query a particular object class. For example, here i queried the HACMPevent object.
root@myserver:/> odmshow HACMPevent class HACMPevent { char name[256]; /* offset: 0xc ( 12) */ char desc[256]; /* offset: 0x10c ( 268) */ short setno; /* offset: 0x20c ( 524) */ short msgno; /* offset: 0x20e ( 526) */ char catalog[256]; /* offset: 0x210 ( 528) */ char cmd[1024]; /* offset: 0x310 ( 784) */ char notify[1024]; /* offset: 0x710 ( 1808) */ char pre[1024]; /* offset: 0xb10 ( 2832) */ char post[1024]; /* offset: 0xf10 ( 3856) */ char recv[1024]; /* offset: 0x1310 ( 4880) */ short count; /* offset: 0x1710 ( 5904) */ long event_duration; /* offset: 0x1714 ( 5908) */ }; /* descriptors: 12 structure size: 0x1718 (5912) bytes data offset: 0x380 population: 89 objects (89 active, 0 deleted) */
i.e. what script to run when a node is attempting to join a cluster.
root@myserver:/> odmget -q name=node_up HACMPevent HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = "" pre = "" post = "" recv = "" count = 0 event_duration = 0
and the powerHA 7.1 events from ODM database,
root@myserver:/> odmget HACMPevent | awk '/name/ {print $3}' | sed 's/"//g'
swap_adapter
swap_adapter_complete
network_up
network_down
network_up_complete
network_down_complete
node_up
node_down
node_up_complete
node_down_complete
join_standby
fail_standby
acquire_service_addr
acquire_takeover_addr
get_disk_vg_fs
node_down_local
node_down_local_complete
node_down_remote
node_down_remote_complete
node_up_local
node_up_local_complete
node_up_remote
node_up_remote_complete
release_service_addr
release_takeover_addr
release_vg_fs
start_server
stop_server
config_too_long
event_error
reconfig_topology_start
reconfig_topology_complete
reconfig_resource_release
reconfig_resource_release_primary
reconfig_resource_release_secondary
reconfig_resource_acquire_secondary
reconfig_resource_complete_secondary
reconfig_resource_release_fence
reconfig_resource_acquire_fence
reconfig_resource_acquire
reconfig_resource_complete
migrate
migrate_complete
acquire_aconn_service
swap_aconn_protocols
get_aconn_rs
release_aconn_rs
server_restart
server_restart_complete
server_down
server_down_complete
rg_move
rg_move_release
rg_move_acquire
rg_move_fence
rg_move_complete
site_down
site_down_complete
site_down_local
site_down_local_complete
site_down_remote
site_down_remote_complete
site_up
site_up_complete
site_up_local
site_up_local_complete
site_up_remote
site_up_remote_complete
site_merge
site_merge_complete
site_isolation
site_isolation_complete
fail_interface
join_interface
cluster_notify
resource_add
resource_modify
resource_delete
resource_online
resource_offline
resource_state_change
resource_state_change_complete
external_resource_state_change
external_resource_state_change_complete
intersite_fallover_prevented
reconfig_configuration_complete
forced_down_too_long
start_udresource
stop_udresource
Wednesday, June 27, 2012
How to use iptrace
The iptrace command, like tcpdump or snoop can be very useful to find out what network traffic flows to and from an AIX system.
You can use any combination of these options, but you do not need to use them all:
- -a Do NOT print out ARP packets.
- -s source IP Limit trace to source/client IP address, if known.
- -d destination IP Limit trace to destination IP, if known.
- -b Capture bidirectional network traffic (send and receive packets).
- -p port Specify the port to be traced.
- -i interface Only trace for network traffic on a specific interface.
Run iptrace on AIX interface en1 to capture port 80 traffic to file trace.out from a single client IP to a server IP:
root@myserver:/> iptrace -a -i en0 -s 10.10.10.19 -b -d 10.10.10.11 -p 80 /tmp/trace.out [17957068]
This trace will capture both directions of the port 80 traffic on interface en1 between the client IP and server IP and sends this to the raw file of trace.out.
To stop the trace:
root@myserver:/> ps -aef | grep iptra root 17957068 1 0 11:09:09 - 0:00 iptrace -a -i en0 -s 10.10.10.19 -b -d 10.10.10.11 -p 80 /tmp/trace.out
root@myserver:/> kill -15 17957068
root@myserver:/> iptrace: unload success!
![]() | Leaving it running too long would require a large amount of disk space! |
The ipreport command can be used to transform the trace file generated by iptrace to human readable format:
root@myserver:/> ipreport /tmp/trace.out /tmp/trace.report IPTRACE version: 2.0 ++++++ END OF REPORT ++++++ processed 0 packets
Tuesday, June 26, 2012
How to resolve gethostbyaddr IPv6 error
What to do when sendmail log "gethostbyaddr(IPv6:::1) failed: 1" warning messages to syslog?
![]() | In AIX 5.3 TL11 and AIX 6.1 TL4 and later, sendmail is IPv6 enabled. When sendmail attempts to resolve local interfaces, it will encounter the IPv6 loopback interface (::1) and perform an IPv6 lookup, which fails and thus the gethostbyaddr warning is logged to syslog. |
To resolve this matter, add this entry into the /etc/hosts file
::1 loopback localhost
Future releases of AIX will automatically include this entry in the /etc/hosts file.
Also, add the following entry to /etc/netsvc.conf :
hosts=local
How to determine File system creation time
To determine the time and date a file system was created, try this.
Find the LV for that file system.
Lets try /opt.root@myserver:/> lsfs /opt Name Nodename Mount Pt VFS Size Options Auto Accounting /dev/hd10opt -- /opt jfs2 10485760 -- yes no
Since /opt is located on LV hd10opt. we then try next
root@myserver:/> getlvcb -AT hd10opt AIX LVCB intrapolicy = c copies = 1 interpolicy = m lvid = 00f603d800002c000000012f34187103.9 lvname = hd10opt label = /opt machine id = 603C84A00 number lps = 160 relocatable = y strict = y stripe width = 0 stripe size in exponent = 0 type = jfs2 upperbound = 32 fs = time created = Thu Aug 25 04:48:35 2011 time modified = Fri Sep 23 10:16:13 2011
Now we can tell that creation time aka "time created" for /opt is in Aug 2011.
How to recreate BOOT LOGICAL VOLUME (BLV) in AIX
If a Boot Logical volume (BLV) is corrupted, the machine will not boot.(Eg:bad block in a disk might cause a corrupted BLV)
Therefore, to fix this situation, You must boot your machine in maintenance mode, from a CD or Tape. If a NIM has been setup for a machine, you can also boot the machine from a NIM master in maintenance mode.
The bootlists are set using the bootlist command or through the System Management Services Progam (SMS). pressing F1 will go to SMS Mode.
then change the bootlist for service(maintenance) mode as 1st device to CD ROM.
# bootlist -m service cd0 hdisk0 hdisk1
then start maintenance mode for system recovery,
Access rootvg,
access this volum group to start a shell, then recreate BLV using bosboot command.
# bosboot -ad /dev/hdisk0
it's important that you do a proper shutdown, All changes need to be written from memory to disk.
# shutdown -Fr
Important!! bosboot command requires that boot logical volume hd5 exists. If you want create a BLV ( may be it had been deleted by mistake ), do the following,
1. boot your machine in maintenance mode,
2. Create a new hd5 logical volume, one PP size, must be in rootvg,specify boot as logical volume type,
# mklv -y hd5 -t boot rootvg 1
![]() | If you have an HMC, then at the time of booting select boot as SMS in the properties of that partition. |