Showing posts with label AIX. Show all posts
Showing posts with label AIX. Show all posts

Tuesday, May 28, 2013

How to list the top 5 largest directories

# du -sk /opt/* | sort +0nr | head -5
25456556        /opt/data
12634192        /opt/read
5483564 /opt/download
196104  /opt/scripts
132964  /opt/freeware


tip.. if you want to include the hidden files and directories, try this.

# du -sk .[a-z]* * | sort +0nr | head -5

Wednesday, May 22, 2013

How to un-mount a volume forcefully


If you ever want to unmount a volume forcefully when the system does not allow, you need to find out who is holding on to the resource, terminate it and then unmount. This is very important to prevent data loss.

Who is holding on to the volume

# lsof | grep "/opt/download"


# fuser -cu /opt/data/System.log

Now terminate them

# kill -9

The PID will be determined from the fuser or lsof command above.

Now unmount the volume peacefully

# umount /opt/download

If you disregard who's active and want to unmount right away,

  # fuser -km /opt/download

In case, you want to unmount a NFS volume that is unreachable, try this.

# umount -f /opt/download

Done...

Tuesday, May 21, 2013

How to retrieve the list of failed logins in AIX

short and sharp.

# /usr/sbin/acct/fwtmp < /etc/security/failedlogin  | more

Wednesday, April 17, 2013

AIX su restriction using sugroup



In AIX, we can restrict who can access a particular user account using the SUGROUP parameter.

Background

Using the below example to explain.

We have normal user account ‘user1’.
We are going to create the admin account for ‘user1’, this account is ‘admin1’.

We do not want to allow any NON-admin to access ‘admin1’, hence we use SUGROUP to restrict. Here, I used ‘admingrp’ group since all admin are in this group.

Not to worry if another DBA access ‘admin1’ since /var/log/authlog would show who used that account. Below example log showed someone using root accessing ‘user2’ account before using ‘admin1’.

devserver:/:>tail -2 /var/log/authlog
Feb 14 10:12:11 devserver auth|security:notice su: from root to user2 at /dev/pts/0
Feb 14 10:12:15 devserver auth|security:notice su: from user2 to admin1 at /dev/pts/0

Parameters and Procedure

The account creation via smitty in its entirety. Further notes

-          Requirement is ‘admin1’ should be member of the group ‘appgrp’ is in so that he can access files in appgrp.

-          Also ‘admin1’ needs to be member of staff group to access ‘user1’ files.

-          Since this is a privilege account, we set the following
o   No remote login (SSH, telnet, etc)
o   No local login (physical, console)
o   Shorter account expiry

-          SU GROUP set to ‘admingrp’

 If you want to user command line, which I don’t really recommend unless you script.

devserver:/:>mkuser "id=11705" "pgrp=appgrp" "groups=appgrp,staff" "home=/home/admin1" "shell=/usr/bin/bash" "gecos=name name name" "login=false" "su=true" "rlogin=false" "admin=false" "sugroups=admingrp" "maxage=5" admin1
devserver:/:>passwd admin1
devserver:/:>pwdadm -c admin1

Result

Remote login will fail.

me@server [~]
~$ssh admin1@devserver
admin1@devserver's password:
Received disconnect from 10.10.50.10: 2: Remote login for account admin1 is not allowed.

Access to root will fail.

devserver:/:>su -user1
$ su - admin1
admin1's Password:
-bash-3.2$ su -
root's Password:
You are not allowed to su to this account.

Cannot su to "root" : Account is not accessible.

Non admin will not be able to access this account.

me@devserver [~]
~$su - admin1
admin1's Password:
You are not allowed to su to this account.

Cannot su to "admin1" : Account is not accessible.

Another admin can access this account.

devserver:/:>su - user2
$ su - admin1
admin1's Password:
-bash-3.2$ id
uid=11705(admin1) gid=101(appgrp) groups=1(staff)
  

Thursday, October 18, 2012

My understanding of RBAC in AIX

What is RBAC?

It stand for Role Based Access Control.
There is major differences between RBAC in AIX 5.3 and older AND RBAC in AIX 6.1/7.1. No value in discussing older RBAC. Will explain for "enhanced RBAC" instead.
Three primary rules are defined for RBAC:
  • Role assignment: A subject can exercise a permission only if the subject has selected or been assigned a role.
  • Role authorization: A subject's active role must be authorized for the subject.
    With rule 1 above, this rule ensures that users can take on only roles for which they are authorized.
  • Permission authorization: A subject can exercise a permission only if the permission is authorized for the subject's active role.
    With rules 1 and 2, this rule ensures that users can exercise only permissions for which they are authorized.

The traditional DAC

Traditional access control, as we call it DAC (discretional access control) has been used for ages and taken for granted. The familiar string r-x------ is fundamental for all sys admin. DAC provides SUID, GUID, etc but the control scope deals with All, GROUP or OWNER access.

AIX RBAC

RBAC provides precise access control such that the target role can only be assumed by a particular user. The range of commands the role can access could be a subset of all the commands that a root or any other account actually have. 

Difference from SUDO

SUDO is another means to control the access to privilege commands. However, it can be tedious to configure each and every commands that you want to allow an account to access.

Difference from Solaris RBAC

In essence, both Solaris RBAC and AIX RBAC are similar. The main difference is the way to implement it.
In Solaris, we use mainly the following files to setup RBAC.

root:/ #ls -l /etc/user_attr /etc/security/exec_attr /etc/security/prof_attr /etc/security/auth_attr
-rw-r--r--   1 root     sys        11855 Mar 28  2012 /etc/security/auth_attr
-rw-r--r--   1 root     sys        20934 Aug 15 11:40 /etc/security/exec_attr
-rw-r--r--   1 root     sys         8433 Aug 15 11:42 /etc/security/prof_attr
-rw-r--r--   1 root     sys         1292 Aug 30 12:01 /etc/user_attr

Authorisation file for Solaris.

root:/ #tail -3 /etc/security/auth_attr
solaris.system.:::Machine Administration::help=SysHeader.html
solaris.system.date:::Set Date & Time::help=SysDate.html
solaris.system.shutdown:::Shutdown the System::help=SysShutdown.html

In AIX, this authorisation list is kept in a DB. You can create custom ones, especially for those not already in the DB. AIX don't provide the help HTML file. In reality, do we use them?

server:/: lsauth ALL | tail -3
wpar.mobility.appli id=10014
wpar.mobility.appli.other id=10016
wpar.mobility.appli.owner id=10015

The Solaris file that manage the effective privilege level to execute the command

root:/ #tail -3 /etc/security/exec_attr
Zone Management:solaris:cmd:::/usr/sbin/zoneadm:uid=0
Zone Management:solaris:cmd:::/usr/sbin/zonecfg:uid=0
DisasterRecovery Admin:suser:cmd:::/opt/sysadmin/Portnet_DR_Scripts/*:uid=root


Next, not much meaning in this profile file but only to maintain the profile and description of the role

root:/: #tail -5 /etc/security/prof_attr
ZFS Storage Management:::Create and Manage ZFS Storage Pools:help=RtZFSStorageMngmnt.html
Zone Management:::Zones Virtual Application Environment Administration:help=RtZoneMngmnt.html
dtwm:::Do not assign to users. Actions and commands required for the window manager (dtwm).:help=Rtdtwm.html
shutdown:::Do not assign to users. Contains actions requiring shutdown authorization.:auths=solaris.system.shutdown;help=Rtshutdown.html
DisasterRecovery Admin:::For running DisasterRecovery scripts:

In AIX, here it is though we can set much more information, like password control, access to smitty and all that.

server:/: lsrole -f appadmin
appadmin:
        authorizations=aix.system.cluster
        rolelist=
        groups=admingrp
        visibility=1
        screens=*
        dfltmsg=role to manage Application resources
        msgcat=
        auth_mode=NONE
        id=11

In Solaris, the file that assign who can assume the role.

root:/ #tail -3 /etc/user_attr
me::::type=normal;profiles=DNS Admin
you::::type=normal;profiles=DNS Admin
her::::type=normal;profiles=DNS Admin

AIX keep this information in the ODM too.

server:/: lsuser -f meuser | grep role
        default_roles=
        roles=appadmin

How to setup

Say for instance, powerHA can only be accessed by root. but to allow menu control of cluster resources, we need to have a means to start/stop/restart/suspend/resume/failover the resources without using root. It is a bad security idea to allow menu to manage the cluster resources via root account.

Hence, we authorise, say meuser to access powerHA administrative commands by giving it ibm.hacmp.admin authorisation. How do we do that?

Check that Enhanced RBAC is enabled.

# lsattr -El sys0 -a enhanced_RBAC
enhanced_RBAC true Enhanced RBAC Mode True 


Let's create the authorisations.

/:> mkauth dfltmsg='IBM custom' ibm
/:> mkauth dfltmsg='IBM custom hacmp' ibm.hacmp
/:> mkauth dfltmsg='IBM custom hacmp admin' ibm.hacmp.admin 

Then check out what privileges that the commands that you are using requires.

# tracepriv -ef /usr/es/sbin/cluster/utilities/clRGinfo
-----------------------------------------------------------------------------
Group Name     State                        Node
-----------------------------------------------------------------------------
apps_rg     ONLINE                       servera
               OFFLINE                      serverb

9568366: Used privileges for /usr/es/sbin/cluster/utilities/clRGinfo:
  PV_AU_ADMIN                        PV_NET_CNTL
  PV_NET_PORT
 
# tracepriv -ef /usr/es/sbin/cluster/events/utils/cl_RMupdate 
...
...
...


if you need to use your own shell script, you may need to add it into the privileged command database. Allow EUID to be equal to the owner of that script.


Now we add the commands into the privileged command database.

/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_NET_PORT,PV_NET_CNTL accessauths=ibm.hacmp.admin /usr/es/sbin/cluster/utilities/clRGinfo
/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_KER_ACCT,PV_PROC_PRIV accessauths=ibm.hacmp.admin euid=0 /usr/es/sbin/cluster/events/utils/cl_RMupdate 

/:> setsecattr -c innateprivs=PV_AU_ADMIN,PV_KER_ACCT,PV_PROC_PRIV accessauths=ibm.hacmp.admin euid=0 /admin.sh
/:> setsecattr -c innateprivs=PV_AU_ADMIN accessauths=ibm.hacmp.admin euid=0 /dlpar.sh

You can verify by using lssecattr.

/:> lssecattr -F -c /dlpar.sh
/dlpar.sh:
        euid=0
        accessauths=ibm.hacmp.admin
        innateprivs=PV_AU_ADMIN



Now, we create a role with the above authorisations.


# mkrole authorizations=ibm.hacmp.admin dfltmsg="Custom role to do admin with hacmp" appadmin

if its for automation, you may want to remove password access to the role by the following command.
chrole auth_mode=NONE appadmin
By default, auth_mode is INVOKER which means that is password control.

Next, allow meuser to be able to assume the role

 chuser roles=appadmin meuser

Before you try it out, you need to update the kernel for all these to take effect. As AIX kernel is RBAC aware for all the IBM system commands, without updating the kernel, any changes will not take effect.

setket

Try it out

swrole 

If you are not allow to assume the role you will receive the following error. In this example, thatuser should not assume appadmin role.

server:/HAapps: su - thatuser
-bash-3.2$ swrole appadmin
swrole: 1420-052 appadmin is not a valid role for thatuser.

It is authorised to assume meuser role instead.

server:/HAapps: su - meuser
-bash-3.2$ swrole appadmin
bash-3.2$ /usr/es/sbin/cluster/events/utils/cl_RMupdate suspend_appmon apps apps_rg
Suspend HA Monitoring for apps.
2012-10-22T15:58:03.289727
2012-10-22T15:58:03.309369
Oct 22 2012 15:58:03 cl_RMupdate: Completed request to suspend monitor(s) for application apps.
Oct 22 2012 15:58:03 cl_RMupdate: The following monitor(s) are in use for application apps:
apps_svr
apps_dm
Reference: http://aixhelp.blogspot.sg/2010/12/aix6-rbac.html

Monday, October 01, 2012

My one liner to extracting lines using sed, perl or awk


While working on extracting data from large amount of files, i have compiled some commands over the years to really helps a lot.

Most of the time, we use head, tail, grep. However, these commands are good at wholesale extracting or just by some keywords. For more complex extraction, we may use sed, perl or awk instead.

Using myfile as example,

myserver:/tmp/:>head -10 myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
#
# Licensed Materials - Property of IBM
#
# COPYRIGHT International Business Machines Corp. 1985,1989
# All Rights Reserved
#


myserver:/tmp:>tail -2 myfile
10.1.1.123     host1

10.2.1.124     host2


myserver:/tmp/:>grep host1 myfile
10.1.1.123     host1


Say, for more complicated stuffs, like extracting 2nd line PLUS 5th to 7th line, i find it tough to code using the above commands.

h2. sed, perl or awk?

Do note that sed will transverse the entire file, hence if you have a very large file, this might take some time.

Say, we want to extract the 2nd line, we can use sed or awk

myserver:/tmp/:>sed 2p myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
...
...


myserver:/tmp/:>awk 'NR==2' myfile
# This is an automatically generated prolog.






If you have try it out, you will see that for sed, the 2nd line is indeed extracted but the rest of the file is also printed out! Use the following to disable printing out the old file.

myserver:/tmp/:>sed -n 2p myfile
# This is an automatically generated prolog.


Alternatively, you might want to 'delete' whatever that you don't want by using the '!d' parameter.

myserver:/tmp/:>sed '2!d' myfile
# This is an automatically generated prolog.


I wouldn't want to use this method as i have difficulty converting the line to use variables. Do give me suggestions or advice if you think otherwise. I don't claim to be expert in writing scripts. :)

IMPORTANT: Note that the single quotes are required. Else '!d' will bring back the last command you have executed with the letter 'd'.



If you only want one and only line from the file, you can get awk to exit after getting that line, otherwise the awk will transverse through the whole file.

myserver:/tmp/:>awk 'NR==6 {print; exit}' myfile
# Licensed Materials - Property of IBM


If we try to extract line 5 to 7 using sed or awk

myserver:/tmp/:>sed -n 5,7p myfile
#
# Licensed Materials - Property of IBM
#


myserver:/tmp/:>awk 'NR==5,NR==7' myfile
#
# Licensed Materials - Property of IBM
#


Here's another trick that i read from Mr Google. If you want to extract every 5th line of a file starting from the top of the file, perl or awk does the job easily.


myserver:/tmp/:>perl -ne 'print unless (0 != $. % 5)' myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
#  /etc/hosts
#
#

...
...


myserver:/tmp/:>awk '0 == NR % 5'   myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
#  /etc/hosts
#
#
...
...


Tip: If you don't want to start from the top of the file, you can put (NR + 1), which means to start from line 1.

Thats all folks.

Friday, September 28, 2012

How to files in AIX from a rm -rf / command

What to do if someone accidentally remove some system critical files in rootvg?

# rm -rf ~

In this case, the stuffs in root's home directory will be removed. You will see /admin, /dev, /bin, etc being deleted. If you are quick to notice the mistake and halt the rm command,

IMPORTANT: Keep your existing SSH session alive at all cost. Otherwise, working on a terminal via HDMC or similar is going to be painful.

h2. So, its "oh shit" right?

Hopefully, /lib is not removed yet, else you are in bigger shit.

ssh, rsync, scp all will no longer work. Let's do a little self repair before recovering the rest of the files.

Is your tar command gone? What commands do i have left?

h2. Recover mkdir

Read from http://coding-journal.com/restoring-your-unix-system-after-rm-rf/ about this one. Try the following.

# echo "mkdir 'bin', 0777;" | perl

This is on the assumption that you lost your mkdir command but still have perl. Here, the /bin directory is created. The full permission is just for this emergency purpose, you can probably change it later.

h2. Recovering /dev

If /dev is lost, you may need to create some of the more critical ones to enable scp and ssh to bring in your backups (mksysb). The steps below are used on AIX 7.1 SP4

# cd /dev
# mknod random c 36 0
# mknod urandom c 36 1
# mknod null c 2 2
# chmod 644 random urandom
# chmod 666 null

Go ahead and try a ssh or sync. If cannot, you may need to restart sshd.

# stopsrc -s sshd
# startsrc -s sshd


if you have another server with a similar make, you can also try to recreate the disk structure but this is not critical if you have a backup which you can extract later. The convention should follow a standard since IBM name all the basic disk the same way on AIX 7.1


# mknod hd1 b 10 8
# mknod  hd2 b 10 5
# mknod  hd3 b 10 7
# mknod  hd4 b 10 4
# mknod  hd5 b 10 1
# mknod  hd6 b 10 2
# mknod  hd8 b 10 3
# mknod  hd9var b 10 6
# mknod  hd10opt b 10 9
# mknod  hd11admin b 10 10
# chmod 660 hd1 hd10opt hd11admin hd2 hd3 hd4 hd5 hd6hd7hd8 hd9var

# mknod hd1 c 10 8

# mknod  hd2 c 10 5
# mknod  hd3 c 10 7
# mknod  hd4 c 10 4
# mknod  hd5 c 10 1
# mknod  hd6 c 10 2
# mknod  hd8 c 10 3
# mknod  hd9var c 10 6
# mknod  hd10opt c 10 9
# mknod  hd11admin c 10 10
# chmod 660 hd1 hd10opt hd11admin hd2 hd3 hd4 hd5 hd6hd7hd8 hd9var


h2. Lets bring back the files. 

After you bring in your backup, you can the commence restoration. I will list example using mksysb file.

Say we need to recover /dev, /admin, bosinst.data, etc. We just double check if the file is usable and whether the original files are inside this archive.

# restore -alvTf mksysb_mysever_date > /tmp/mksysb.server.txt

 Then proceed to restore.

# restore -xvqf mksysb_mysever_date ./bosinst.data
# mv bosinst.data /

# restore -xvqf mksysb_mysever_date ./dev
# cd ./dev
# mv * /dev/

# restore -xvqf mksysb_mysever_date ./admin
# cd ./admin
# mv * /admin/

# restore -xvqf mksysb_mysever_date ./.ssh
# mv .ssh /

# restore -xvqf mksysb_mysever_date ./.profile
# mv .profile /

so on and forth.

If you have another server with the similar make or build, you may want to go the extra step to verify if there are anything else that is still missing. 

In addition, go for a reboot at the nearest opportunity to ensure all is working well. Nothing is confirmed until it is tested and proven working.


Thursday, September 27, 2012

Rotating AIX audit log

Found that audit log grow too much on my new servers.

myserver:/:>audit query | head -2
auditing on
bin processing off


The audit will record audit events like, 'su', 'passwd', file changes, cron, mail, tcpip, lvm, etc. Since audit files are kept on a separate partition for my case, risk of widespread diskspace full is still not that great.

myserver:/:>df -k | grep audit
/dev/fslv00        262144    227972   84%        8     1% /audit


myserver:/:>ls -l /audit/
total 67608
-rw-------    1 root     system            0 Sep 14 16:43 auditb
-rw-rw----    1 root     system        10453248 Sep 14 16:43 bin1
-rw-rw----    1 root     system        11456 May 14 10:25 bin2
drwxr-xr-x    2 root     system          256 Jul 10 14:43 lost+found
-rw-r-----    1 root     system     34589752 May 14 10:24 trail


Although the binsize in /etc/security/audit/config is set to 10240, which is 10240 bytes but the bin1 and bin2 files did not stay within the 10kb limit.

Also, there is a cron that 'rotate' the trail log file but it does not compress the rotated file, hence disk space is still being hogged.

myserver:/:>crontab -l | grep audit
0 * * * * /etc/security/aixpert/bin/cronaudit


So, let me suggest a workaround.

For the cron script, we add in a line to gzip the rotated log file after shifting the old file.

mv /audit/trail /audit/trailOneLevelBack
gzip /audit/trailOneLevelBack



For the bin1 and bin2 files, stop audit, rotate the files and start audit.

# audit shutdown
# cp -p /audit/bin1 /audit/bin1.
# cp -p /audit/bin2 /audit/bin2.

# gzip /audit/bin1.
# gzip /audit/bin2.

# cp /dev/null /audit/bin1
# cp /dev/null /audit/bin2

# audit start


Be careful not to change the inode of the files. Otherwise, i read from Mr Google that audit might get 'confused' and does not write audit logs into the bin files anymore. you might then need to reboot the host for audit to recover.

Monday, August 20, 2012

Essential boot information in AIX

Here's some practical tips on boot information in AIX.


h4. uptime and when was it last rebooted.

In RHEL and Solaris, we can only find the uptime and when was it last rebooted.

# uptime
  10:06AM   up  19:09,  1 user,  load average: 0.35, 0.64, 0.65

# who -b
   .        system boot Aug 16 14:58


In AIX, we have this additional command to find history of reboot records. Power of ODM.

# last reboot
reboot    ~                                   Aug 16 14:58
reboot    ~                                   Aug 16 14:42
reboot    ~                                   Aug 15 13:59
reboot    ~                                   Aug 15 10:44
reboot    ~                                   Aug 14 15:14
reboot    ~                                   Jul 10 16:25
reboot    ~                                   May 25 12:14
reboot    ~                                   May 10 16:22
reboot    ~                                   May 07 17:02
reboot    ~                                   May 02 16:24
reboot    ~                                   May 02 15:58
reboot    ~                                   Apr 30 16:41
reboot    ~                                   Apr 25 15:19
reboot    ~                                   Apr 24 16:15
reboot    ~                                   Apr 24 15:35

wtmp begins     Apr 24 15:35


h4. State of the boot record

Here, we can spool and find out which disk has boot records for you to boot from.

# ipl_varyon -i
[S 8257680 9306164 08/17/12-10:07:37:395 ipl_varyon.c 1270] ipl_varyon -i


PVNAME          BOOT DEVICE     PVID                    VOLUME GROUP ID
hdisk0          YES             00f72ff5025fdaf30000000000000000        00f72ff500004c00
hdisk1          YES             00f72ff5025fdb3c0000000000000000        00f72ff500004c00
hdisk2          NO              00f72ff32bcd79f10000000000000000        00f72ff500004c00
hdisk3          NO              00f72ff32b5332f20000000000000000        00f72ff300004c00
hdisk4          NO              00f72ff32b5334930000000000000000        00f72ff300004c00
hdisk5          NO              00f72ff32b5336370000000000000000        00f72ff300004c00
[E 8257680 0:274 ipl_varyon.c 1410] ipl_varyon: exited with rc=0


h4. Creation of boot record

In the firmware (SMS), we can set the boot devices, e.g. disk, cd-rom, network. For disk, we would also need to create the boot record so that the server know HOW to load up AIX.

we create the boot record like this.

# bosboot -ad /dev/hdisk1

If you want to remove the boot record, you can try the following.

# chpv -c hdisk1

h4. Creation of boot list

Here, we create the boot list so that the server know WHERE to load up AIX.

Below, we see that there are 2 devices that we can boot up from. And it corresponds to the above ipl_varyon command. You can compare with what is set in SMS, they should match.

# bootlist -m normal -ov
'ibm,max-boot-devices' = 0x5
NVRAM variable: (boot-device=/pci@800000020000101/pci1014,0339@0/sas/disk@40600:2 /pci@800000020000101/pci1014,0339@0/sas/disk@40700:2)
Path name: (/pci@800000020000101/pci1014,0339@0/sas/disk@40600:2)
match_specific_info: ut=disk/sas/scsd
hdisk0 blv=hd5 pathid=0
Path name: (/pci@800000020000101/pci1014,0339@0/sas/disk@40700:2)
match_specific_info: ut=disk/sas/scsd
hdisk1 blv=hd5 pathid=0

If you need just the list of what we can boot up from, just drop the 'v' to reduce verbosity.

# bootlist -m normal -o
hdisk0 blv=hd5 pathid=0
hdisk1 blv=hd5 pathid=0

How to create or add the boot list, you may ask. Here's the command and example to create for the above.

# bootlist -m normal hdisk0 hdisk1

If you need to do a network boot all the time, you can set like the following.

bootlist -m normal en0 bserver=10.106.101.1 gateway=10.106.101.250 client=10.106.101.5

where
'bs' means boot server
'client' means the machine what we reboot


h4. Where is the boot image?

The boot image is usually found in hd5 (Boot PV). Below example shows that the server is booted from hd5 in hdisk0.

# bootinfo -v
hd5

# bootinfo -b
hdisk0

Attention: Never reboot the system when you suspect the boot image is corrupted.

h4. Recreation of boot image (Boot LV).

In the event you suspect the boot image is corrupted, you can recreate it using the following.

# bosboot -a -d /dev/hdisk0

** If the command fails and you receive the following message:

    0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.

    Try to resolve the problem using one of the following options, and then run the bosboot command again until you have successfully created a boot image:

        Delete the default boot logical volume (hd5) and then create a new hd5.

    Or
        Run diagnostics on the hard disk. Repair or replace, as necessary.{info}

* If the bosboot command continues to fail, contact your customer support representative.
* Attention: If the bosboot command fails while creating a boot image, do not reboot your machine.
* When the bosboot command is successful, reboot your system to confirm.

Once done, update 'mini-ODM' in boot LV.

# savebase -v

Quote from Reference:
The bootrec (also known as bootstrap) is read by a special part of the firmware called System ROS (- the Read Only Storage is responsibe for the initial preparation of the machine -), and it (bootrec)  tells the ROS that it needs to jump X bytes into the disk platter, to read the boot logical volume, hd5.

During reading the blv, there is a mini-ODM read into the RAM. (Later, when the real rootvg fs comes online, AIX merges the data in mini-ODM with the real ODM held in /etc/objrepos.)

When an LVM commands changes the mini-ODM, the command 'savebase' needed to run as well. Savebase takes a snapshot of the ODM and compresses it

h4. How all these gel together. (My understanding and value adding just in case, people complain i copy too much) :P

After powering on, the server will POST.

Then it will use the boot list to find which disk, cdrom, network to load the Boot LV (inside hd5) from. Boot LV contains the AIX kernel, rc.boot file and commands required during the boot process and the mini-ODM.

next, kernel will take over the boot process. 

The kernel will then loads up the file system before executing the init process (from Boot LV) which will execute the rc.boot. The rootvg will be activated and then init process from the disk will be executed to replace the init process from Boot LV and become PID 1.

The kernel will move through the rc states and get the system ready.


Thanks to the reference that i understand more. <http://aix4admins.blogspot.sg/2011/08/mkitab-adds-record-to-etcinittab-file.html>

Friday, August 17, 2012

How to create empty file with a fixed size

In Linux, we use mktemp or mkfile.
In Solaris, we use touch.

In AIX, we try the following

# /usr/sbin/lmktemp filename filesize

e.g.


# /:> lmktemp Log 104857600
Log

# /:>chown myusr:mygrp Log

# /:>ls -ltr
total 204808
-rw-r--r--    1 myusr   mygrp    104857600 Aug 15 17:12 Log

# /:>file Log
Log: commands text


Thats all folks!

Friday, August 10, 2012

How to extract PowerHA configuration from ODM for quick recovery

In AIX, ODM holds a lot of information and configuration.

In the event that ODM goes kaput, all hell will break loose. Especially for powerHA, having a backup of the configuration will aid in the recovery of powerHA configuration issues.

As i'm still learning to use AIX and powerHA, do let me know if my method is good enought. :)

I have written the script to extract powerHA configurations.

#/bin/ksh
#
# Script Name : spool_HA_config.sh
# Written     : 08 Aug 2012
# Author      : Victor Kwan At gmail
#
# Description : This is to spool the powerHA 7 configuration on a
#               AIX 7.1 machine.
#               This script should be cron to run regularly for
#               quick recovery if powerHA configuration gets corrupted
#               in AIX ODM.
#
# Updates     : 08 Aug 2012 : First version
#             : 10 Aug 2012 : spooled files now uses DDMMYYYY_HHmmSS format
#


# Declarations
#
DATE=`date +'%d%m%Y_%H%M%S'`

# Safety Measure
#
WHO=`/usr/bin/whoami`

if [ ${WHO} != root ]
then
        echo "You shouldn't be running this using ${WHO}! Script will now terminate."
fi

#
# Spool the HA configuration from ODM
/usr/es/sbin/cluster/utilities/clsnapshot -c -i -n HA_snap_`hostname`_${DATE} -d "HA snapshot on ${DATE}" >/dev/null 2>&1

# Ends
# ~


The main star in this script is the clsnapshot command. By default, the output of clsnapshot command will be saved at /usr/es/sbin/cluster/snapshots.

Below is the sample of the files spooled. There are 2 files, one *.odm and one *.info. I think both are required to be imported to powerHA if we need to recover from configuration issues.

-rw-r--r--    1 root     system        57482 Aug 10 01:00 HA_snap_serverA_10082012_010000.odm
-rw-r--r--    1 root     system        86579 Aug 10 01:00 HA_snap_serverA_10082012_010000.info



Of course, there are many things we need to keep watch on, we wouldn't want to run this script manually. Hence, put it in root cron to be run daily.

# PowerHA configuration daily spool
0 1 * * * /myscript_folder/spool_HA_config.sh >/myscript_folder/spool_HA_config.output 2>&1


 and we are done.

Monday, July 02, 2012

How to resolve LVM error in powerHA

In the event you run into the following error:

    cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group. 

This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:

    # varyoffvg vg 

And then retry the LVM command.

BUT if it continues to be a problem, then stop powerHA 7.1 on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.

Saturday, June 30, 2012

Installing IBM Websphere Applicationi Server (WAS) 6 Plugin on Apache 2.2.22

Learnt that plugin setup for WAS is different from the setting up in BEA weblogic environment.

WAS plugin require an installation while we just need to put in the plugin for BEA weblogic. Here's the steps to get WAS running.

Installing the Plugin in the web server.


Download the installation file from IBM. i.e. "Tiv_Middl_Inst_750_1of3_Linux_x86-64.tar" since i'm using linux for my web server.

Transfer the tar file into the web server and unpack.

Was only able to install using the supplied GUI so be prepared to export display.

# cd linux64/WS-WAS_ND_7.0_Supplemental
# gunzip C1G36ML.tar.gz
# tar xfp C1G36ML.tar
# cd plugin
# export BROWSER=/usr/bin/mozilla
# export DISPLAY=10.10.10.16:0.0
# ./launchpad.sh
need to have your xwin or xmanager ready.

My template cfg.xml file is now at "/opt/IBM/WebSphere/Plugins/config/myapp/plugin-cfg.xml"

Copy the configuring script to the WAS server.

There is a customised configuration script that you need to run at the WAS server to generate the real cfg.xml file. It is usually in the plugin bin, i.e. "/opt/IBM/WebSphere/Plugins/bin/configuremyappserver.sh"
copied this to the root directory of WAS, i.e. /opt/was/IBM/WebSphere/AppServer/bin and run it.

Next, generating the plugin xml file in WAS server.

root@myappserver:/opt/was/IBM/WebSphere/AppServer/bin> ./configuremyappserver.sh
Realm/Cell Name: <default>
Username: wasuser
Password:                                                                                                                                                    
WASX7209I: Connected to process "dmgr" on node myappserverCellManager01 using SOAP connector;  The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[myapp, APACHE, /usr/local/apache2222, /usr/local/apache2222/conf/httpd.conf, 80, MAP_ALL, /opt/IBM/WebSphere/Plugins, unmanaged, mywebserver-node, mywebserver, linux]"

Input parameters:

   Web server name             - myappserver
   Web server type             - APACHE
   Web server install location - /usr/local/apache2222
   Web server config location  - /usr/local/apache2222/conf/httpd.conf
   Web server port             - 80
   Map Applications            - MAP_ALL
   Plugin install location     - /opt/IBM/WebSphere/Plugins
   Web server node type        - unmanaged
   Web server node name        - mywebserver-node
   Web server host name        - mywebserver
   Web server operating system - linux

Creating the unmanaged node mywebserver-node .
Unmanged node mywebserver-node is created.

Creating the web server definition for myapp.
Web server definition for myapp is created.

Start computing the plugin properties ID.
Plugin properties ID is computed.

Start updating the plugin install location.
Plugin install location is updated.

Start updating the plugin log file location.
Plugin log file location is updated.

Start updating the RemoteConfigFilename location.
Plugin remote config file location is updated.

Start updating the RemoteKeyRingFileName location.
Plugin remote keyring file location is updated.

Start saving the configuration.

Configuration save is complete.

Computed the list of installed applications.

Processing the application myapp.
Get the current target mapping for the application myapp.
Computed the current target mapping for the application myapp.
Start updating the target mappings for the application myapp.
Target mapping is updated for the application myapp.

Start saving the configuration.

Configuration save is complete.

Transfer the plugin-cfg.xml file to the web server.

 scp /opt/was/IBM/WebSphere/AppServer/profiles/Dmgr01/config/cells/myappserverCell01/nodes/mywebserver-node/servers/myapp/plugin-cfg.xml user@mywebserver:/tmp/
The generated file is generally in "profiles_install_root/config/cells/cell_name/nodes/node_name/servers/web_server_name" directory
The place to put the plugin-cfg.xml file is generally in "plugins_install_root/config/web_server_name" directory

Start up Apache and test.

you should be good to go.

Friday, June 29, 2012

AIX powerHA auto-verification

powerHA 7.1 will automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:

# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1

If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.

With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:

    # smitty clautover.dialog
                                                        [Entry Fields]
* Automatic cluster configuration verification        Enabled                                                                                             +
* Node name                                           Default                                                                                             +
* HOUR (00 - 23)                                     [00]                                                                                                 +#
  Debug                                               yes        

You can check with:

    # odmget HACMPcluster
    # odmget HACMPtimersvc

Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.

Source Reference (If the original author do not agree that i post on my blog, please let me know. :) )

http://www.aixhealthcheck.com/blog.php?id=116

Thursday, June 28, 2012

How to list Network statistics

This is actually useful across all platforms like Solaris, AIX, Linux and etc to list out the network statistics of the NICs. We can tell if there are any potential network issues and spawn off necessary actions.

 
root@myserver:/> netstat -i
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
en2   1500  link#2      d2.48.a8.b8.c9.2    13862     0    14038     0     0
en2   1500  10.10.10   myserver          13862     0    14038     0     0
lo0   16896 link#1                          10644     0    10644     0     0
lo0   16896 127         loopback            10644     0    10644     0     0
lo0   16896 loopback                        10644     0    10644     0     0

How to check powerHA settings and events from ODM

In ODM, the object class definition that will be displayed when you query a particular object class. For example, here i queried the HACMPevent object.

root@myserver:/> odmshow HACMPevent
class HACMPevent {
        char name[256];                              /* offset: 0xc ( 12) */
        char desc[256];                              /* offset: 0x10c ( 268) */
        short setno;                                 /* offset: 0x20c ( 524) */
        short msgno;                                 /* offset: 0x20e ( 526) */
        char catalog[256];                           /* offset: 0x210 ( 528) */
        char cmd[1024];                              /* offset: 0x310 ( 784) */
        char notify[1024];                           /* offset: 0x710 ( 1808) */
        char pre[1024];                              /* offset: 0xb10 ( 2832) */
        char post[1024];                             /* offset: 0xf10 ( 3856) */
        char recv[1024];                             /* offset: 0x1310 ( 4880) */
        short count;                                 /* offset: 0x1710 ( 5904) */
        long event_duration;                         /* offset: 0x1714 ( 5908) */
        };
/*
        descriptors:    12
        structure size: 0x1718 (5912) bytes
        data offset:    0x380
        population:     89 objects (89 active, 0 deleted)
*/

i.e. what script to run when a node is attempting to join a cluster.

root@myserver:/> odmget -q name=node_up HACMPevent

HACMPevent:
        name = "node_up"
        desc = "Script run when a node is attempting to join the cluster."
        setno = 101
        msgno = 7
        catalog = "events.cat"
        cmd = "/usr/es/sbin/cluster/events/node_up"
        notify = ""
        pre = ""
        post = ""
        recv = ""
        count = 0
        event_duration = 0
 
 
and the powerHA 7.1 events from ODM database,

root@myserver:/> odmget HACMPevent | awk '/name/ {print $3}' | sed 's/"//g'
swap_adapter
swap_adapter_complete
network_up
network_down
network_up_complete
network_down_complete
node_up
node_down
node_up_complete
node_down_complete
join_standby
fail_standby
acquire_service_addr
acquire_takeover_addr
get_disk_vg_fs
node_down_local
node_down_local_complete
node_down_remote
node_down_remote_complete
node_up_local
node_up_local_complete
node_up_remote
node_up_remote_complete
release_service_addr
release_takeover_addr
release_vg_fs
start_server
stop_server
config_too_long
event_error
reconfig_topology_start
reconfig_topology_complete
reconfig_resource_release
reconfig_resource_release_primary
reconfig_resource_release_secondary
reconfig_resource_acquire_secondary
reconfig_resource_complete_secondary
reconfig_resource_release_fence
reconfig_resource_acquire_fence
reconfig_resource_acquire
reconfig_resource_complete
migrate
migrate_complete
acquire_aconn_service
swap_aconn_protocols
get_aconn_rs
release_aconn_rs
server_restart
server_restart_complete
server_down
server_down_complete
rg_move
rg_move_release
rg_move_acquire
rg_move_fence
rg_move_complete
site_down
site_down_complete
site_down_local
site_down_local_complete
site_down_remote
site_down_remote_complete
site_up
site_up_complete
site_up_local
site_up_local_complete
site_up_remote
site_up_remote_complete
site_merge
site_merge_complete
site_isolation
site_isolation_complete
fail_interface
join_interface
cluster_notify
resource_add
resource_modify
resource_delete
resource_online
resource_offline
resource_state_change
resource_state_change_complete
external_resource_state_change
external_resource_state_change_complete
intersite_fallover_prevented
reconfig_configuration_complete
forced_down_too_long
start_udresource
stop_udresource
 

Wednesday, June 27, 2012

How to use iptrace

The iptrace command, like tcpdump or snoop can be very useful to find out what network traffic flows to and from an AIX system.

You can use any combination of these options, but you do not need to use them all:
  • -a Do NOT print out ARP packets.
  • -s source IP Limit trace to source/client IP address, if known.
  • -d destination IP Limit trace to destination IP, if known.
  • -b Capture bidirectional network traffic (send and receive packets).
  • -p port Specify the port to be traced.
  • -i interface Only trace for network traffic on a specific interface.
Example:
Run iptrace on AIX interface en1 to capture port 80 traffic to file trace.out from a single client IP to a server IP:

root@myserver:/> iptrace -a -i en0 -s 10.10.10.19 -b -d 10.10.10.11 -p 80 /tmp/trace.out
[17957068]

This trace will capture both directions of the port 80 traffic on interface en1 between the client IP and server IP and sends this to the raw file of trace.out.

To stop the trace:
root@myserver:/> ps -aef | grep iptra
    root 17957068        1   0 11:09:09      -  0:00 iptrace -a -i en0 -s 10.10.10.19 -b -d 10.10.10.11 -p 80 /tmp/trace.out 
 
root@myserver:/> kill -15 17957068
 
root@myserver:/> iptrace: unload success!
 
Leaving it running too long would require a large amount of disk space!

The ipreport command can be used to transform the trace file generated by iptrace to human readable format:
 
root@myserver:/> ipreport /tmp/trace.out /tmp/trace.report
IPTRACE version: 2.0


++++++ END OF REPORT ++++++

processed 0 packets

Tuesday, June 26, 2012

How to resolve gethostbyaddr IPv6 error

What to do when sendmail log "gethostbyaddr(IPv6:::1) failed: 1" warning messages to syslog?

In AIX 5.3 TL11 and AIX 6.1 TL4 and later, sendmail is IPv6 enabled. When sendmail attempts to resolve local interfaces, it will encounter the IPv6 loopback interface (::1) and perform an IPv6 lookup, which fails and thus the gethostbyaddr warning is logged to syslog.

To resolve this matter, add this entry into the /etc/hosts file
 
::1 loopback localhost

Future releases of AIX will automatically include this entry in the /etc/hosts file.

Also, add the following entry to /etc/netsvc.conf :
hosts=local

How to determine File system creation time

To determine the time and date a file system was created, try this.

Find the LV for that file system.

Lets try /opt.

root@myserver:/> lsfs /opt
Name            Nodename   Mount Pt               VFS   Size    Options    Auto Accounting
/dev/hd10opt    --         /opt                   jfs2  10485760 --         yes  no

Since /opt is located on LV hd10opt. we then try next
 
root@myserver:/> getlvcb -AT hd10opt
         AIX LVCB
         intrapolicy = c
         copies = 1
         interpolicy = m
         lvid = 00f603d800002c000000012f34187103.9
         lvname = hd10opt
         label = /opt
         machine id = 603C84A00
         number lps = 160
         relocatable = y
         strict = y
         stripe width = 0
         stripe size in exponent = 0
         type = jfs2
         upperbound = 32
         fs =
         time created  = Thu Aug 25 04:48:35 2011
         time modified = Fri Sep 23 10:16:13 2011

Now we can tell that creation time aka "time created" for /opt is in Aug 2011.

How to recreate BOOT LOGICAL VOLUME (BLV) in AIX

If a Boot Logical volume (BLV) is corrupted, the machine will not boot.(Eg:bad block in a disk might cause a corrupted BLV)
Therefore, to fix this situation, You must boot your machine in maintenance mode, from a CD or Tape. If a NIM has been setup for a machine, you can also boot the machine from a NIM master in maintenance mode.

The bootlists are set using the bootlist command or through the System Management Services Progam (SMS). pressing F1 will go to SMS Mode.

then change the bootlist for service(maintenance) mode as 1st device to CD ROM.

# bootlist -m service cd0 hdisk0 hdisk1

then start maintenance mode for system recovery,

Access rootvg,
access this volum group to start a shell, then recreate BLV using bosboot command.

# bosboot -ad /dev/hdisk0

it's important that you do a proper shutdown, All changes need to be written from memory to disk.
 
# shutdown -Fr

Important!! bosboot command requires that boot logical volume hd5 exists. If you want create a BLV ( may be it had been deleted by mistake ), do the following,

1. boot your machine in maintenance mode,
2. Create a new hd5 logical volume, one PP size, must be in rootvg,specify boot as logical volume type,
# mklv -y hd5 -t boot rootvg 1
3. Then run bosboot command as described.

If you have an HMC, then at the time of booting select boot as SMS in the properties of that partition.