Tuesday, September 13, 2011

Oracle Database Listener not starting due to TNS-00525

Recently hit with this issue while starting the oracle db listener on AIX.

TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.1.0 - Production System parameter file is /opt/oracle/product/11.2/mydb/network/admin/listener.ora
Log messages written to /opt/oracle/diag/tnslsnr/myserver/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myserver)(PORT=1234)))
Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0)))
TNS-12555: TNS:permission denied
 TNS-12560: TNS:protocol adapter error
  TNS-00525: Insufficient privilege for operation
   IBM/AIX RISC System/6000 Error: 1: Not owner

Listener failed to start. See the error message(s) above...
Suspecting that the cause of this was because i was configuring the powerHA in AIX and trying to start the DB. Thanks to blauecorsa, this leads me to finding the root cause. Some where in the "truss -aedfo result.txt lsnrctl start" shows some ECONNREFUSED errors. I checked and found both the /var/tmp/.oracle and /tmp/.oracle are being owned by root:system as below.
root> ls -l /tmp/ | grep oracle
drwxrwxrwt    2 root   system             256 Sep 13 13:03 .oracle

root> ls -l /var/tmp/ | grep oracle
drwxrwxrwt    2 root   system             256 Sep 07 13:58 .oracle
so we can confirm that powerHA started the DB using the startup / shutdown script using root account. The startup / shutdown scripts were modified to su - oracle -c "" and we are back in business. last question, can powerHA assume oracle account to start / shutdown the database so that no change to the scripts is needed?

Monday, September 12, 2011

Listing and Verifying SSL Certificate using openssl

How to check the private key

Using openssl command,

# openssl rsa -in myserver.key -check -noout
RSA key ok
How to check SSL certificate (*.crt format) using openssl command

Using openssl command

 # openssl x509 -in myserver.crt -text -noout | more
To get only the validity dates
 # openssl x509 -in myserver.crt -text -noout | grep "Not"
            Not Before: Apr 24 14:35:54 2010 GMT
            Not After : Jul 27 04:17:47 2011 GMT
How to check SSL certificate (*.cer format)

Using openssl command,
 # openssl x509 -inform der -in myserver.cer -text | more 
To get only the validity dates,
 # openssl x509 -inform der -in myserver.cer -text | grep "Not"
            Not Before: Nov 14 05:44:26 2008 GMT
            Not After : Nov 14 05:54:26 2010 GMT

SSL Certificate Monitoring

Here's a guide on how to setup monitoring for SSL certificates

It is important to ensure that the SSL certificates used in services that are fronting users or for secure communication are valid otherwise, we risk service outage because of expired certificates.

The SSL monitoring script

Using the monitoring script "SSL Certificate Check" written by Matty, we can use it to monitor the SSL certificates either by verifying the certificate itself or querying the status through the application services.

Link to detailed documentation at SSL Certificate Check

How to set it up.

I am using the script v3.21 dated Oct 2010 in the example.

1) Download at SSL Certificate Check Script

2) Deploy it to a suitable location. Give it execute permission at least 0700.

I have configured the script to the following

# Who to page when an expired certificate is detected (cmdline: -e)
ADMIN="admin@myserver.com"

# Number of days in the warning threshhold  (cmdline: -x)
WARNDAYS=100

# If QUIET is set to TRUE, don't print anything on the console (cmdline: -q)
QUIET="FALSE"

# Don't send E-mail by default (cmdline: -a)
ALARM="TRUE"

# Don't run as a Nagios plugin by default (cmdline: -n)
NAGIOS="FALSE"
where the script will notify via email (default) when the certificate has less than 100 days of validity. It will print out to console. If you don't need it, change QUIET to "TRUE".

If you require to override the default settings in the script, you can use the following switches

#./sslcertcheck
Usage: ./sslcertcheck [ -e email address ] [ -x days ] [-q] [-a] [-b] [-h] [-i] [-n] [-v]
       { [ -s common_name ] && [ -p port] } || { [ -f cert_file ] } || { [ -c certificate file ] }

  -a                : Send a warning message through E-mail
  -b                : Will not print header
  -c cert file      : Print the expiration date for the PEM or PKCS12 formatted certificate in cert file
  -e E-mail address : E-mail address to send expiration notices
  -f cert file      : File with a list of FQDNs and ports
  -h                : Print this screen
  -i                : Print the issuer of the certificate
  -k password       : PKCS12 file password
  -n                : Run as a Nagios plugin
  -p port           : Port to connect to (interactive mode)
  -s commmon name   : Server to connect to (interactive mode)
  -q                : Don't print anything on the console
  -v                : Only print validation data
  -x days           : Certificate expiration interval (eg. if cert_date < days)
Requirements.

mktemp package needs to be available in the server.

Usage

1) Running the script against the certificate file.

$ sslcertcheck -c /etc/httpd/conf/ssl.crt/abc.pem
Host                                            Status       Expires      Days Left
----------------------------------------------- ------------ ------------ ----------
FILE:/etc/httpd/conf/ssl.crt/abc.pem            Valid        Jan 2 2010   807   
sslcertcheck will print the file or hostname in the first column, a value to indicate if the certifciate is valid in the second column, the date the certificate will expire in the third column, and the number of days remaining until the certificate expires in the fourth column.

2) If you do not have local access to the certificate files, you can use sslcertcheck's network connectivity option to extract the certificate expiration date from a live server. To check when the certificate used by the web server will expire, the server name or IP address and a port number can be passed to sslcertcheck's "-s" (server name) and "-p" (tcp port) options:

#./sslcertcheck -s 172.21.41.136 -p 443

Host                                            Status       Expires      Days
----------------------------------------------- ------------ ------------ ----
172.2.1.1:443                               Valid        Jul 24 2011  128
3) You may want to manage dozens of SSL-enabled servers, you can place the server names and port numbers in a file, and run sslcertcheck against that file:

The configuration file.

$ cat sslcertcheck.cfg
10.10.8.1 443
10.10.8.7 443
172.2.1.1 443
The output from the script with setting whom to email when any entry has validity less than threshold.

# ./sslcertcheck -e admin@me.com -f sslcertcheck.cfg 

Host                                            Status       Expires      Days
----------------------------------------------- ------------ ------------ ----
10.10.8.1:443                                 Valid        Nov 14 2013  972
10.10.8.7:443                                 Valid        Jan 20 2014  1039
172.2.1.1:443                               Valid        Jul 24 2011  128
Thats all folks!

Basic networking TCP test using telnet

When telneting to an IP at a given port, there are various telnet responses. Knowing the difference in telnet responses could easily point you in the right direction when a telnet to a host on a particular port in unsuccessful.

There are a distinct differences in getting ‘refused’ or ‘timeout’ responses.

You will get a connection refused message for one of the following reasons:

  • The application you are trying to test hasn’t been started/installed on the remote server.
  • There is a firewall rejecting the connection attempt by terminating the connection setup.
Example output from a Linux box:
$ telnet server2 7063
Trying 172.1.1.1...
telnet: connect to address 172.1.1.1: Connection refused
telnet: Unable to connect to remote host: Connection refused
The similar Connection refused message from a Solaris box :
$ telnet server3 7055
Trying 172.2.1.1...
telnet: Unable to connect to remote host: Connection refused
The Connect failed message is the equivalent but from a Windows box :
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\vickwan>telnet 172.2.1.1 7062
Connecting To 172.2.1.1...Could not open connection to the host, on port 7062: Connect failed
The telnet command will abort the attempted connection after waiting a predetermined time for a response. This is called a timeout response.

In some cases, telnet won’t abort, but will just wait indefinitely. This is also known as hanging. These symptoms can be caused by the one of the following reasons:
  • The remote server doesn’t exist on the destination network. It could be turned off.
  • The could be a routing issue, either the request or the response never gets to the destination.
  • A firewall could be blocking the connection attempt, causing it to timeout instead of being quickly refused.
Here is an example of the output:
$ telnet server3 7055
Trying 172.2.1.1...
telnet: connect to address 172.2.1.1: Connection timed out
telnet: Unable to connect to remote host: Connection timed out
The script, command file and input file.

Reference Adapted from : http://blog.ru.co.za/2009/09/29/telnet/

This little script is written to helps cut down time needed to test if ACL allows connection from server A to server B at a given port. It will attempt to suggest the remedy actions.

Script tested on AIX 6.1, AIX 7.1, RHEL 4.6 and Solaris 9.

#!/bin/ksh
# Written By   : Victor Kwan
# Written On   : 25 Oct 2009
# EMAIL   : victorkk [AT] gmail [DOT] com
# Description  : Utility to test TCP ACL via telnet
# Updated On   : 27 Oct 2009 : Attempt to interpret telnet response.
#              : 09 Mar 2011 : Test if telnet command is executable
#              : 07 Apr 2011 : Support AIX, Improve code to be not chatty and terminate telnet session properly.

FILE=${1}
OUTPUTFILE="$0.output"
LOGFILE="$0.log"
TELNETCMD="$0.telnetcmd"
TELNET=`which telnet`
CAT=`which cat`
ECHO=`which echo`
OS="`uname -s`"

#UNIX Normal "Connection to 10.106.50.10 closed."
#UNIX No route "No route to host"
#UNIX Conn refused "telnet: Unable to connect to remote host: Connection refused"
#UNIX timed out "telnet: Unable to connect to remote host: Connection timed out"

RESPONSE_NORMAL="gn host."
RESPONSE_NO_ROUTE="to host"
RESPONSE_CONN_REFUSED="refused"
RESPONSE_TIMED_OUT="med out"

#AIX Normal "Connection closed."
#AIX No route "No route to host"
#AIX Conn refised "telnet: connect: A remote host refused an attempted connect operation."
#AIX timed out "telnet: connect: A remote host did not respond within the timeout period."

AIXRESPONSE_NORMAL="Connection closed."
AIXRESPONSE_NO_ROUTE="No route to host"
AIXRESPONSE_CONN_REFUSED="connect operation."
AIXRESPONSE_TIMED_OUT="he timeout period."

THISRESPONSE_NORMAL="$RESPONSE_NORMAL"
THISRESPONSE_NO_ROUTE="$RESPONSE_NO_ROUTE"
THISRESPONSE_CONN_REFUSED="$RESPONSE_CONN_REFUSED"
THISRESPONSE_TIMED_OUT="$RESPONSE_TIMED_OUT"

COLOR_BLUE="\033[0;34m"
COLOR_GREEN="\033[32m"
COLOR_RED="\033[31m"
COLOR_BRIGHTRED="\033[1;31m"
COLOR_WHITE="\033[0m"
COLOR_BRIGHTWHITE="\033[1;37m"

if [ ! -x $TELNET ]
then
        echo "${COLOR_BRIGHTRED}Telnet command is not executable!!${COLOR_WHITE}"
        echo "${COLOR_WHITE}Script will now terminate.${COLOR_WHITE}"
        exit
fi

echo "Commence telnet test based on [$FILE] file."
echo

cat ${FILE} | grep -v "#" | while read LINE do {
        IP=`echo $LINE | awk -F: '{print $1}'`
        PORT=`echo $LINE | awk -F: '{print $2}'`

        ($CAT $TELNETCMD) | $TELNET $IP $PORT >> $OUTPUTFILE 2>&1

        RESPONSE=`tail -1 $OUTPUTFILE | tr -d "\r" | tr -d "\n"`
	if [ "$OS" = "SunOS" ]
	then
	{
		STR_TO_CMP=`echo "$RESPONSE" | awk '{print substr($0,length-7)}'`
	}
	elif [ "$OS" = "AIX" ]
	then
	{
		STR_TO_CMP=`echo "$RESPONSE" | awk '{print substr($0,length-18)}'`
		THISRESPONSE_NORMAL="$AIXRESPONSE_NORMAL"
		THISRESPONSE_NO_ROUTE="$AIXRESPONSE_NO_ROUTE"
		THISRESPONSE_CONN_REFUSED="$AIXRESPONSE_CONN_REFUSED"
		THISRESPONSE_TIMED_OUT="$AIXRESPONSE_TIMED_OUT"
	}
	fi

        if [ ! "$STR_TO_CMP" = "$THISRESPONSE_NORMAL" ]
        then
        {
                echo "Telnet ${COLOR_BRIGHTRED}FAILED${COLOR_WHITE} for ${COLOR_BRIGHTWHITE}$IP:$PORT${COLOR_WHITE}."
                echo "${COLOR_BRIGHTRED}Error Message${COLOR_WHITE} : [$RESPONSE]!"

                if [ "$STR_TO_CMP" = "$THISRESPONSE_NO_ROUTE" ]
                then
                {
                        echo "${COLOR_GREEN}Suggestion${COLOR_WHITE}: Check routing at both source and destination"
                }
                fi

                if [ "$STR_TO_CMP" = "$THISRESPONSE_CONN_REFUSED" ]
                then
                {
                        echo "${COLOR_GREEN}Suggestion${COLOR_WHITE}: Destination may not be listening, routable or firewall is blocking the connection."
                }
                fi

                if [ "$STR_TO_CMP" = "$THISRESPONSE_TIMED_OUT" ]
                then
                {
                        echo "${COLOR_GREEN}Suggestion${COLOR_WHITE}: Destination may not be listening, routable or firewall is blocking the connection."
                }
                fi
        }
        fi
     	echo "Done for $IP:$PORT."
        echo " "
}
done
echo "Telnet test ends."
For the input file, e.g. IP_PORT It is okay to have commented lines as the script will ignore them.
~$more IP_PORT
#WLS
server2:7003
server2:7004
server2:7022
server2:7023
...
...
For the command file, the 2 telnet control commands must be used.
~$ more testACL.telnetcmd
^]
quit
Final outcome. Output may look similar to the following. No output for telnet success.
> ./testACL_telnet.ksh IP_PORT
Commence telnet test based on [IP_PORT] file.
Telnet FAILED for server4:7053.
Error Message : [telnet: Unable to connect to remote host: Connection refused]!
Suggestion: Destination may not be listening on this IP and Port, routable or firewall is blocking the connection.
...
...
...
telnet test ends.

Swap partition in AIX

By default, swap space is defined as 512 Mb which is a little bit low when oracle, websphere or any other enterprise application would be running. Hence, it will be important to increase the swap space before the installation of these applications.

 lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
hd6             hdisk0            rootvg       512MB     1   yes   yes    lv     0
AIX put swap space on hd6 by default.

To increase the swap space, use 'chps -s xxx hd6' command where xxx mean by how many PP you want to increase.
lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
hd6             hdisk0            rootvg       12480MB     1   yes   yes    lv     0
swap -l
device              maj,min        total       free
/dev/hd6              10,  2     12480MB     12448MB
tmp in AIX

We know that /tmp is using tmpfs which is essentially swap in solaris. However, in AIX, /tmp is using the usual file system 'jfs2'. This means that whatever is placed in /tmp/ would not be wiped out after reboot as the space really have a allocated disk space.

AIX put /tmp on hd3 by default.
df -k | grep tmp
/dev/hd3          6291456   2706240   57%     4993     1% /tmp
See that the files from 2010 are still around.
ls -lt /tmp/ | tail -5
d-w-------    2 root     system          256 Nov 12 12:12 errmbatch
-rw-r--r--    1 root     system          530 Nov 12 11:33 IBM.CSMAgentRM_dr.sh.dbg
-rw-------    1 root     system            0 Nov 12 11:30 .strload.mutex
-rw-r--r--    1 root     system          406 Nov 12 11:26 .sr_migrate.log
drwx------    2 root     system          256 Nov 12 10:41 lost+found
Uptime is only 15 days, which prove that the above files from 2010 are really not wiped out.
uptime
  02:59PM   up 15 days,   3:03,  3 users,  load average: 1.50, 1.77, 1.87
Timestamp for the sake of completeness.
date
Thu Apr  7 14:59:55 GMT+08:00 2011
Good or bad? i don't think this is critical to the performance nor operation of the system hence i recommend to leave it alone.

Why SSHD account cannot be removed

In modern SSHD, the privilege separation security feature is provided to allow SSHD to create unprivileged child process to deal with incoming network traffic. After successful authentication, another process will be created that has the privilege of the authenticated user. Privilege separation is to prevent privilege escalation by containing any corruption within the unprivileged processes.

Default setting in SSHD is 'yes', meaning its enabled. Hence, the account 'sshd' account with 'sshd' group is required by SSHD.

How to Remove Unwanted route to 169.254.0.0 in RHEL Linux

Every time the system boots, You may have seen the following with the route to 169.254.0.0.

# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.10.2.0      *               255.255.255.0   U     0      0        0 bond0
10.10.2.0      *               255.255.255.0   U     0      0        0 eth3
169.254.0.0     *               255.255.0.0     U     0      0        0 eth3
default         10.10.2.254    0.0.0.0         UG    0      0        0 bond0
This is the zeroconf route (169.254.0.0). You manually disable it by turning off the firewall and remove the route with 169.254.0.0 / 255.255.0.0 using the route command.

Permanent Solution:

To disable the zeroconf route during system boot, edit the /etc/sysconfig/network file and add the following NOZEROCONF value to the end of the file:
NETWORKING=YES
HOSTNAME=localhost.localdomain
NOZEROCONF=yes
Layman Explanation:

Zeroconf, or Zero Configuration Networking, is a set of techniques that automatically create a usable IP network without configuration or special servers.

This allows inexpert users to connect computers, networked printers, and other network devices and expect a functioning network to be established automatically. Without Zeroconf, a user must either set up special services, like DHCP and DNS, or set up each computer's network settings manually, which may be challenging for non-technical or novice users.

Additional Information: wiki more about zeroconf

How to transfer MQ objects from one site to another using saveqmgr

Question

You used the SupportPac MS03 to save your queue manager object definitions to a file that you named TEST.TST. You want to use the saveqmgr output file (TEST.TST) to create the same queue manager object definitions on a new server machine.

Whats the process to redefine your objects?
First, copy the saveqmgr output file (TEST.TST) onto the new server machine.

Create the queue manager

crtmqm TEST
Start the queue manager
strmqm TEST
Define your objects
runmqsc TEST < TEST.TST
where TEST is the name of the new QMGR and TEST.TST is the name of the queue manager object definitions file. Reference: MQ Reference

Simple AIX LVM concept and Disk structure

The LVM controls everything here through logical volume device drive. It takes a complex structure of PP, including mirroring, and presents a simple LP to the application.















Physical Volume (hdiskX)

Each physical disk is called a physical volume (PV).

Every PV in use belongs to a VG unless the PV is used as raw storage device or a hot spare.
Each PV contains 1 or more disks (or platters) stacked on top of each other.

PV is also known as direct access storage devices (DASDs). It can be fixed or removable.
A block (sector) is a continuous 512-byte region on PV that corresponds in size to a DASD sector.
A Partition is a set of blocks (with sequential cylinder, head, and sector numbers) contained within a single PV.

Physical Parition

Each PV is divided into PP of a fixed size (e.g. 4Mb partition size).
This is the smallest unit a PV can have.

Volume Group (rootvg, datavg)

Each VG is a collection of 1 or more PV.

A VG can consist of any mixture of physical disk types, though performance is most consistent if devices of the same size are used in a single VG.

PV cannot be shared between VG, therefore entire PV becomes part of the VG.

Each system can have up to 32 VG only.
Information about all LVs and PVs within a VG are stored in VGDA area.
first 64K of a PV is reserved for this area as defined in .
The VGDA consists of bootrecord (first 512 bytes), bad block directory () and LVM record ()

Informations about which PP are stale and which PV are missing within a VG is stored in VGSA area.
The LVM adn SCSI driver reserves somewhere between 7 - 10% of the available disk space for LVM maps, etc.

Lets go deeper.

With each VG, 1 or more LV are defined.

Logical Volume (hd1, hd2var, etc)

Each LV group together 1 or more Logical Partition (LP).

LV is an area of disk used to store data, appearing contiguous to the application but may not be contiguous on the actual PV.
A file system resides on top of an LV, there only 1 LV is mapped to a file system.
LV reside only within VG.
LV can span on multiple PV within the VG.
LV cab also have mirrored copies on different PV within the VG.
Each VG can contain up to 255 LV.

All system LV should reside in the VG named rootvg, under 1 PV if possible. This will allow complete re-installation of the system from backup without affecting the application data that resides on other VG.

There is no difference in response time between a single or multiple VG. When a LV is accessed, the ODM db (/etc/objrepos/*) is searched to determine which VG the LV belongs to, then the VGDA for the VG is read to determine the physical placement of the LV on that PV and ultimately the physical track and sector where the data resides.

Logical Partition

Each LP contains 1 or more physical partition (PP).

If LV is mirrored, then additional PP are allocated.
These mirrors usually reside on different PV (for availability) but may reside on same PV (for performance).

Up to 2 copies of PP can be mirrored, where you have LP count of 3 (2 mirrors plus original)
Space Region in physical Volume










For space allocation purposes, each PV is divided into 5 regions, namely the outer_edge, out_middle, center, inner_middle, and inner_edge. These can be viewed as cylindrical segments cut perpendicularly through the disk platters. The number of physical partitions in each region depends on the total capacity of the disk drive.

AIX Useful commands

For my own references

Commands for memory

shows how much RAM does my machine has (as root)
bootinfo –r 
shows how much RAM does my machine have (as non root)
lsattr –E –l sys0 –a realmem

Simulates a system with various sizes of memory for performance testing of applications.

To set the memory size to 512 MB
rmss -c 512
To reset the memory size to the original one
rmss -r 

Commands for Disk / file system

Shows all disks
lsdev -Ccdisk
Gives info about the disk manufacture type
lscfg -vp -l hdisk0 | grep Machine
Displays disk size (even if no volume group is assigned)
bootinfo –s hdisk0
Displays information about a physical volume within a volume group (Display PP's used, location on disk, mount point)
lspv -p hdisk0
List of Filesystem items of jfs2 file system type.
lsfs -v jfs2  
List the parameter the filesystem of jfs2 file system type
lsfs -q -v jfs2

From here, we can see if e.g. /backup was or is a big_filesystem_enabled one. Important for the 2GB File limit.

Displays all exported volumes
showmount –e    
Show who's got my file systems mounted over IP !
showmount -a
Shows all mounted filesystems (nfs+local)
mount
Synchronize the device configuration database (ODM) with the logical volume manager information for rootvg
synclvodm -vP rootvg
Redfined VG definition in ODM
redefinevg rootvg
Reads logical volumes info from disk
lqueryvg -p hdisk0 –Avt  
Increases /var FS by about 100MB (each unit is 512 bytes)
chfs -a size=+200000 /var
Mounts all FS stated in /etc/filesystems
mount all
Unmounts a FS
umount /mnt/
Identifies processes using a file or file structure
fuser –c /dataVolumes/test
Releases a CD that will not unmount by sending SIGKILL signal to each local process. Must be root.
fuser -k /dev/cd0
Shows when the file was last created/modified/accessed
istat 
Reports input/output statistics for logical partitions, logical volumes and volume groups every 2 seconds
lvmstat –v hd2 2
Enable / Disable LVM statistic collection
lvmstat -e hd2

lvmstat -d hd2

Commands for Network

Displays en0 driver params
lsattr  -El  en0
Displays ent0 Hardware params
lsattr  -El  ent0
Display specific device specified by logical name ent0
lscfg  -v -l ent0
Display firmware device tree for the corresponding node to the ent0 device
lscfg -vp -lent0
Displays en0 driver params
ifconfig en0
Displays network interfaces setting
netstat  -i
Display all interface information - host:addr:mask:_rawname:nameserv:domain:gateway:cost:activedgd:type:start
for i in `lsdev | grep en | grep "Standard Ethernet" | awk '{print $1}'`; do echo Checking $i;mktcpip -S $i; done
Displays all IP oriented processes status
lssrc –g tcpip
Display any communication errors on ent2
entstat -drt ent2 |grep –i error
Resets all the statistics back to their initial values.
entstat -r
Shows a local arp cache
arp -a
Shows IP statistics
netstat  -ptcp
Shows UDP statistics
netstat  -pudp
Show network buffer cache statistic
netstat  -c
Show network statistic for each protocol
netstat  -s
Shows statistics recorded by the memory management routines
netstat -m
Trace en2 every 3 seconds
netstat -I en2 3
Ping with displaying the routing info
ping -R -c 1 myserver
Display routing info
netstat -rn
Shows the state of all configured interfaces
netstat -in
Display routing info with full hostnames
netstat -r
Trace all hobs (interconnections=routers) to the destination IP
traceroute 149.115.39.1

Monitors activity and reports statistics on network I/O and network-related CPU usage.

To start
netpmon -o netpmon.out
To Stop
trcstop
Displays a fully qualified domain name of a host
namerslv -s | grep domain | awk '{ print $2 }'
Shows the status of a remote host on the local network
rup myserver

Commands for Tape Device

Displays tape params
lsattr -El rmt0
all information about a tape drive
lscfg -vp -l rmt0
Shows all tapes lsdev -Cctape

Commands for NFS

Resets NFS stats without reboot
nfsstat  –z
To stop / start NFS services on a client
stopsrc –g NFS

startsrc –g NFS
Mount an NFS filesystem
mount hostname:/filesystem /mount-point
Exports a directory to NFS clients
mknfsexp -d /directory
Mounts a directory from an NFS server
mknfsmnt
Changes the configuration of the system to stop running NFS daemons
rmnfs 
Configures the system to run NFS
mknfs
Un-exports a filesystem
exportfs -u (filesystem)
Lists all exported filesystems
exportfs
Exports all directories listed in /etc/exports file
exportfs -a

Commands for Devices

Displays system type, firmware, etc driver params
lsattr  -El  sys0
Lists all system HW config (NVRAM)
lscfg –v
List all scsi devices
lsdev –Csscsi
List all pci devices
lsdev –Cspci
List all scsi adapters
lsparent –Ck scsi
List fiberchannel devices
lsdevfc
Configures devices
cfgmgr
This is similar to devfsadm in Solaris Specifies verbose output. The cfgmgr command writes information about what it is doing to standard output.
cfgmgr -v -l device –v
Name Specifies the named device to configure along with its children.
cfgmgr -v -l device
If you only turned on a disk tower at e.g. scsi2 cfgmgr -v -l scsi2 will only configure this with detailed output.

Commands for Graphic adapter

To check which graphic adapter is installed.
lsdisp
List all information about a adapter
lscfg -vp -l mga0

Commands for boot

Change the default bootlist
bootlist -m normal cd0  rmt0 hdisk0

Commands for Firmwares

Display the system firmware level and service processor
lsmcode -c
Display the adapter microcode levels for a RAID adapter scraid0
lsmcode -r -d scraid0
Display the microcode level for all supported devices
lsmcode -A

Commands for System Information

Get machine ID
/usr/bin/uname -m
Get platform type
/usr/bin/uname -M
Determines the system serial number
lsattr -El sys0 -a systemid | awk '{print $2}' | awk -F, '{print $2}'
Displays current AIX level
oslevel
Displays current AIX maintenance level
oslevel -r
List filesets at levels later than maintenance level !!!
oslevel -g
List all information about a processors
lscfg -vp -l proc0  (1,2,3)   
List all information about memory modules installed
lscfg -vp -l mem0 |pg
Determines the system Firmware level
lscfg –vp|grep ROM|grep -v CD  
Check last system dump status
sysdumpdev -L
Check system dump device settings
sysdumpdev -l
List contents of the package
lslpp -f Upd_Timna_DTM.obj
Shows create/modify/access file info
istat 
Display system boot log
alog -o -t boot | more
Identifies the users currently logged in
who
Shows recent 5 lines
last –5
Shows username ‘root’ login/logout record
last root
Displays local system statistics in interactive mode and records system statistics in recording mode
nmon
The nmon command provides the following views in interactive mode:
    * System resource view (using the r key)
    * Process view (using the t and u keys)
    * AIO processes view (using the A key)
    * Processor usage small view (using the c key)
    * Processor usage large view (using the C key)
    * Shared-processor logical partition view (using the p key)
    * NFS panel (using the N key)
    * Network interface view (using the n key)
    * WLM view (using the W key)
    * Disk busy map (using the o key)
    * Disk groups (using the g key)
    * ESS vpath statistics view (using the e key)
    * JFS view (using the j key)
    * Kernel statistics (using the k key)
    * Long term processor averages view (using the l key)
    * Large page analysis (using the L key)
    * Paging space (using the P key)
    * Volume group statistics (using the V key)
    * Disk statistics (using the D key)
    * Disk statistics with graph (using the d key)
    * Memory and paging statistics (using the m key)
    * Adapter I/O statistics (using the a key)
    * Shared Ethernet adapter statistics (using the O key)
    * Verbose checks OK/Warn/Danger view (using the v key)
    * Detailed Page Statistics (using the M key)
    * Fibre channel adapter statistics (using the ^ key)

Reports selected local and remote system statistics

topas
Monitors system 10 top processes with 2 seconds
monitor -top 10 -s 2
Displays disks activity every 2 seconds refresh interval
iostat 2
Displays the adapter and disk throughput report every 2 seconds
iostat –a 2
Monitors virtual memory statistics every 2 seconds
vmstat 2
Show all CPU’s activity on an SMP machine 2 times separated by 2 seconds.
Each row is for 1 logical processor.
'U' row is for system-wide Unused capacity.
Last line is for system-wide statistics.
sar –P ALL 2 2
Monitors real and virtual memory
svmon –i 2
Shows top 10 memory usage by process
ps auxw | sort –r +3 |head –10  
Shows top 10 CPU usage by process
ps auxw | sort –r +2 |head –10
Traces FS,LV,disks,files activity of a “find” command into a logfile (filemon.out). Must be preceded by a trcstop command.
filemon –O all –o filemon.out ; find / -name core ; trcstop
Traces CPU activity of a “find” command Several logfile are created. Must be preceded by a trcstop command.
tprof –x find / -name core ; trcstop
Shows the DNS server name and address
nslookup hostname
Trace CPU activity for next 30 seconds. Results in file sleep.tprof
tprof -ske -x "sleep 30"

Commands for paging.

Paging space settings.
lsps -a

Commands for user accounts

Environment setings - show user ulimit
env ulimit
Shows all user parameters (max .file size,etc)
lsuser –f root
Lists login users and their programs.
w

Commands to browse errlog

Generates a report of logged errors. Default command
errpt
All details
errpt -a
Moderate details
errpt -A
Eliminates double entries
errpt –D
Browse the errlog in detail for all errors within a timeframe
errpt -a  -s 0604090601  -e 0605090901
where date format is mmddhhmmyy Generates a report of resource names specified by the ResourceNameList variable (SYSPROC)
errpt -a  -N SYSPROC |more
Browse the errlog by the identifier
errpt -j 5DFED6F1
>h4>Misc Find all files containing "10.10.10.13" IP address
ksh
find / -type f|xargs grep "10.10.10.13" 2> /dev/null
Compresses the files while keeps the original
compress -c file > file.Z
Returns full path of program (Similar to 'which' command in Solaris / Linux. Also available in AIX)
whereis  

Finding out which HMC your server is attached to

Find out which HMC your server is attached to (in AIX 6.1 / 7.1)

bash-3.2# lsrsrc IBM.MCP
Resource Persistent Attributes for IBM.MCP
resource 1:
        MNName           = "10.10.10.15"
        NodeID           = 4935249694248815870
        KeyToken         = "10.10.10.10"
        IPAddresses      = {"10.0.3.20","10.10.10.15","10.10.10.16","fe80::e62f:13ff:f231:6218","fe80::e62f:13ff:fe33:24b0","fe80::e62f:13ff:fe33:24b2"}
        ActivePeerDomain = ""
        NodeNameList     = {"myserver"}
In this example, we see that myserver is connected to HMC via 10.10.10.15 and 10.10.10.16.

How a small difference in NFS can cause sleepless night

Setting up NFS to share to non-AIX hosts

I wanted to setup NFS between AIX and non-AIX systems using NFS. being more familiar with NFS setup in Solaris and Linux, i thought setup would be a breeze.

what went wrong

AIX is very strict on who can access the partition, i.e. root. Otherwise, even though the NFS share is created but you will not be able to mount the partition from the other hosts.

Also ensure that both NFS server and client can use the same security protocol, otherwise the result is same as above. As the NFSv3 is the more commonly used protocol, it is recommended for the current setup.

Lastly, put in explicitly who can read-write or read-only to control access.

Sunday, September 11, 2011

How NPIV can save on fibre cabling for SAN

What is NPIV

NPIV, which stands for N-Port ID Virtualisation is a fibre channel facility allowing multiple N-Port IDs to share a singale physic N-Port. Hence this allows multiple fibre channel to occupy a single physical port, easing hardware requirements in the SAN design.

It is noted that NPIV is an extension to a standard already defined in the fibre channel protocols that allow one to get past single initiator/single target design limitations.

In order to take advantage of this, both the HBA card (from the host and SAN array) and the switch must support NPIV to generate and publish an additional WWPN in a virtual fashion.

Why is it good

Traditionally, we provide at least 2 fibre link for each host, 1 link on 1 controller which is connected to 1 SAN switch in the production environment. With 4 LPARs in the p7 server requiring SAN connection, potentially, we need at least 8 fibre links with 8 SAN ports allocation. Additional links and SAN ports are needed to connect to the SAN arrays.

With the NPIV protocol, we can use just 2 fibre links to connect between the SAN switch and the p7 server. This is a savings of 75%!

Why it may be bad

In the event that there are lots of host sharing the same fibre link, the nightmare of link failure will be catastrophic. It may be mitigated by having 2 HBA controllers with 2 links each, and distributed connection to 2 different SAN switches.

How Disk Expansion on AIX can fail

IBM has been boasting about the ease of disk space management on its AIX hosts. Sure, we are able to expand the file system on AIX for many occasions. i.e. /opt, /home, /tmp and etc. However, trying on /opt/oracle/oradata failed. Let see why.

The story.

my oravg was full due to rapid expansion of business, hence needed more space. So no impact is expected as disk space expansion should be on the fly. I have personally tried on /tmp, / and /var, so i am pretty confident that this is OK.

Expansion was completed but from "df" command, we cant see the expected changes. Then the phone started ringing. Oracle DB has crashed.

Binary installation was still intact but the DB is gone.

Troubleshooting and recovery

fsck showed that the "bit allocation map" is corrupted. Tried to creat another VG and try to copy the LV over to the new VG, trying to see if the inodes, files, links and bit allocation map can be recreated. The map couldn't be recreated.

Searching in google did not yield any useful information other than creating snapshot to backup the system or apply snapshot to recover the system. So this is a provision for quick recovery rather than what i need.

It seems that oracle DB or IBM DB2 are "aware" of the disk physical boundaries and when the boundaries are changed, they do not know how to react and hence crashed. In addition, there was no space for the OS to process the file system expansion since it was 100% full. The next recovery step is to wipe the oravg clean and rebuild the DB.

The painful thing was that the building of DB took more than 3 days and this rebuild took another weekend. ouch.

The take away

Before doing any disk expansion for DB or on any other disk where the applications running off the disk are "aware" of the disk setup, please shut it down before doing any disk changes. In addition, should clear up some space so that the OS has some leeway to process the file system expansion on the fly.

I'm lucky that this is a trial setup.

So do not believe blindly that disk space management is on the fly without strings attached. There are bound to be some obscure conditions that will break it.

Murphy law - "Anything that can go wrong, will go wrong"

My thoughts on how can 802.1q can save on network cablings and potential problems.

IEEE 802.1Q or commonly known as VLAN tagging is a networking standard for sharing of physical Ethernet network link by multiple independent logical networks.

The protocol works with the MAC layer and Spanning Tree Protocol (802.1D) to allow nodes / hosts on different VLAN to communicate with each other through network switch or router on Network Layer.

VLAN tagging

If i have 2 different environment where servers are members of different network segments, in order to allow the different hosts which are in the p750 server to share the same physical ethernet link, the switch or router need to understand and route the network traffic for both the 10.10.10.* and 10.10.20.* to the p750 machine.

Within the p750 machine, the NIC is capable of deciphering the VLAN ID and route the traffic to the designated hosts or LPARs for inward traffic. For outward traffic, the NIC would tag the VLAN ID to the traffic and route it out to the gateway.

This is also similar to the VTP or ISL protocol that is proprietary to Cisco.

Why is it good.

With 2 VIO servers, the different LPARs in the same IBM p750 machine would traditionally need more than 20 UTP cables. With 802.1Q, we need only 2 cables for all the LPARs and 2 cables for HMC. This is more than 80% savings!

Why it may be bad

In the event that there are lots of host sharing the same UTP link, the nightmare of link failure will be catastrophic. It may be mitigated by having 2 NIC controllers with 2 links each, and distributed connection to 2 different switches.

i guess you cant have your cake and eat it!

How to automatically redirect HTTP to HTTPS in Apache

Redirecting HTTP to HTTPS is one common and popular way to protect user privacy and sensitive information without making user typing 'https' manually to access your site.

First, we verify that Apache is configured for HTTPS connection and necessary SSL certificates are already in placed.

Then, either we use redirect or mod_rewrite.

  • Using mod_rewrite. Add these directives to your configuration file:
  •           RewriteEngine On
              RewriteCond %{SERVER_PORT} !^443$
              RewriteRule ^/(.*) https://%{SERVER_NAME}/$1 [L,R]
    Make sure you have loaded mod_rewrite module into Apache.

  • Using redirect. Add these directives to your configuration file:
  •           SSLRequireSSL
              Redirect permanent /secure https://www.domain.com/secure


The 2nd method which uses redirect uses one less module, so security wise could be better. In addition, you don't need to worry about re-writing on logs and etc. Now we restart Apache and go test it out.

How to check LVM health in AIX

What to check for LVM

We can check the state, the utilisation of the PV and VG, whether is the state open, closed, stale, syncd or whether the utilisation is max out already.

As LVM consist of many components like, PP, PV, LV, VG and so on, the best way is to script the things to check. For details of what is the acronyms, please refer to AIX LVM concept and Disk structure.

My newbie demo script for checking LVM health. Gurus out there can really help me along if you think i can do better with some suggestions. :D

#!/bin/ksh
# FILENAME   : checkLVM.ksh
# AUTHOR     : Victor Kwan
# EMAIL     : victorkk [AT] gmail [DOT] com
# PURPOSE    : To check the health of PV, VG and LV
#            : and alert sys admin if threshold is breached.
# DATE       : Feb 2011
#

#
# Parameter setup
OUTPUTFILE="checkLVM.PV.`hostname`.`date '+%d%b%Y'`.output"
NOTIFICATION_MSG="checkLVM.PV.`hostname`.`date '+%d%b%Y'`.message"
isFOUND=0
isERROR=0

if [ $# -ne 2 ]
then
        printf "Usage: \n\t$0  \n\n"
        exit
fi

PV_THRESHOLD=$1
EMAIL="$2"

#
# Extract PV Information from ODM
lspv | while read PV; do
        printf "\n$PV\n" >> $OUTPUTFILE
        printf "--------------------------\n" >> $OUTPUTFILE
        lspv $PV >> $OUTPUTFILE
done

#
# Check for PV Errors
grep -n "PV STATE" $OUTPUTFILE > $OUTPUTFILE.PV

printf "\n\n\n------------------------------\n" > $NOTIFICATION_MSG
printf " Check for PV errors\n" >> $NOTIFICATION_MSG
printf "------------------------------\n" >> $NOTIFICATION_MSG

cat $OUTPUTFILE.PV | while read PVLINE
do
        isLOGICAL_CHECK=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $3}' | grep -v "active" | wc -l`
        #printf "[DEBUG]isLOGICAL_CHECK is %d.\n" $isLOGICAL_CHECK

        if [ $isLOGICAL_CHECK == 1 ]
        then
                PV_STATUS=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $3}' | grep -v "active"`
                PV_LINE=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $1}'`
                PV=`head -$PV_LINE $OUTPUTFILE | tail -3 | head -1 | awk '{print $3}'`
                VG=`head -$PV_LINE $OUTPUTFILE | tail -3 | head -1 | awk '{print $6}'`
                PV_LINE_TOTALPP=`echo $PV_LINE + 3 | bc`
                PV_SIZE=`head -$PV_LINE_TOTALPP $OUTPUTFILE | tail -1 | awk '{print $4}' | awk -F\( '{print $2}'`
                PV_LINE_USEDPP=`echo $PV_LINE + 5 | bc`
                PV_USED=`head -$PV_LINE_USEDPP $OUTPUTFILE | tail -1 | awk '{print $4}' | awk -F\( '{print $2}'`

                #printf "[DEBUG]The line is $PVLINE\n"
                printf "Volume Group: %s\n" $VG >> $NOTIFICATION_MSG
                printf "Physical Volume: %s\n" $PV >> $NOTIFICATION_MSG
                printf "Status     | Size (Mb) | Used (Mb)\n" >> $NOTIFICATION_MSG
                printf "%-10s | %-9d | %-8d\n\n" $PV_STATUS $PV_SIZE $PV_USED >> $NOTIFICATION_MSG
                isFOUND=1
                isERROR=1
        fi
done

if [ $isERROR == 0 ]
then
        printf "All Physical Volumes are clean.\n" >> $NOTIFICATION_MSG
fi

# Check for PV full
grep -n "PV STATE" $OUTPUTFILE > $OUTPUTFILE.PV
isERROR=0

printf "\n\n\n------------------------------\n" >> $NOTIFICATION_MSG
printf " Check for PV utilisation\n" >> $NOTIFICATION_MSG
printf " PV Threshold: $PV_THRESHOLD \n" >> $NOTIFICATION_MSG
printf "------------------------------\n" >> $NOTIFICATION_MSG

cat $OUTPUTFILE.PV | while read PVLINE
do
        isLOGICAL_CHECK=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $3}' | wc -l`
        #printf "[DEBUG]isLOGICAL_CHECK is %d.\n" $isLOGICAL_CHECK

        if [ $isLOGICAL_CHECK == 1 ]
        then

                PV_STATUS=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $3}'`
                PV_LINE=`echo $PVLINE | grep "PV STATE" | awk -F: '{print $1}'`
                PV=`head -$PV_LINE $OUTPUTFILE | tail -3 | head -1 | awk '{print $3}'`
                VG=`head -$PV_LINE $OUTPUTFILE | tail -3 | head -1 | awk '{print $6}'`
                PV_LINE_TOTALPP=`echo $PV_LINE + 3 | bc`
                PV_SIZE=`head -$PV_LINE_TOTALPP $OUTPUTFILE | tail -1 | awk '{print $4}' | awk -F\( '{print $2}'`
                PV_LINE_USEDPP=`echo $PV_LINE + 5 | bc`
                PV_USED=`head -$PV_LINE_USEDPP $OUTPUTFILE | tail -1 | awk '{print $4}' | awk -F\( '{print $2}'`
                PV_PERCENTAGE=$(echo "scale=8; $PV_USED / $PV_SIZE * 100" | bc)

                if [ $PV_PERCENTAGE -ge $PV_THRESHOLD ]
                then
                        #printf "[DEBUG]The line is $PVLINE\n" >> $NOTIFICATION_MSG
                        printf "Volume Group: %s\n" $VG >> $NOTIFICATION_MSG
                        printf "Physical Volume: %s\n" $PV >> $NOTIFICATION_MSG
                        printf "Status     | Size (Mb) | Used (%%)\n" >> $NOTIFICATION_MSG
                        printf "%-10s | %-9d | %-5.2f\n\n" $PV_STATUS $PV_SIZE $PV_PERCENTAGE >> $NOTIFICATION_MSG
                        isFOUND=1
                        isERROR=1
                fi
        fi
done

if [ $isERROR == 0 ]
then
        printf "All Physical Volume within threshold.\n" >> $NOTIFICATION_MSG
fi

rm $OUTPUTFILE
rm $OUTPUTFILE.PV

# Extract VG Information from ODM
lsvg | while read VG
do
        print "\nListing $VG:\n" >> $OUTPUTFILE
        lsvg $VG >> $OUTPUTFILE
        lsvg -l $VG >> $OUTPUTFILE
        #lsvg -l $VG | egrep -v "^$VG:" | egrep -v "^LV NAME" | while read LV JUNK
        #do
        #       lslv $LV >> $OUTPUTFILE
        #done
done

# Check for VG errors in ODM
grep -n "VG STATE" $OUTPUTFILE > $OUTPUTFILE.VG
isERROR=0

printf "\n\n\n------------------------------\n" >> $NOTIFICATION_MSG
printf " Check for VG errors\n" >> $NOTIFICATION_MSG
printf "------------------------------\n" >> $NOTIFICATION_MSG

cat $OUTPUTFILE.VG | while read VGLINE
do
        isVG_CHECK=`echo $VGLINE | grep "VG STATE" | awk -F: '{print $3}' | wc -l`
        #printf "[DEBUG]isVG_CHECK is %d.\n" $isVG_CHECK

        if [ $isVG_CHECK == 1 ]
        then
                VG_STATUS=`echo $VGLINE | grep "VG STATE" | awk '{print $3}'`
                VG_LINE=`echo $VGLINE | grep "VG STATE" | awk -F: '{print $1}'`
                VG=`head -$VG_LINE $OUTPUTFILE | tail -2 | head -1 | awk '{print $3}'`
                VG_LINE_TOTALPP=`echo $VG_LINE + 1 | bc`
                VG_TOTALPP=`head -$VG_LINE_TOTALPP $OUTPUTFILE | tail -1 | awk '{print $7}' | awk -F\( '{print $2}'`
                VG_LINE_USEDPP=`echo $VG_LINE + 3 | bc`
                VG_USEDPP=`head -$VG_LINE_USEDPP $OUTPUTFILE | tail -1 | awk '{print $6}' | awk -F\( '{print $2}'`
                VG_LINE_TOTALPV=`echo $VG_LINE + 5 | bc`
                VG_TOTALPV=`head -$VG_LINE_TOTALPV $OUTPUTFILE | tail -1 | awk '{print $3}'`
                VG_LINE_STALEPV=`echo $VG_LINE + 6 | bc`
                VG_STALEPV=`head -$VG_LINE_STALEPV $OUTPUTFILE | tail -1 | awk '{print $3}'`
                VG_STALEPP=`head -$VG_LINE_STALEPV $OUTPUTFILE | tail -1 | awk '{print $6}'`
                VG_LINE_ACTIVEPV=`echo $VG_LINE + 7 | bc`
                VG_ACTIVEPV=`head -$VG_LINE_ACTIVEPV $OUTPUTFILE | tail -1 | awk '{print $3}'`

                PV_LINE=`lsvg -p $VG | wc -l`
                PV_NUMOFMEMBERS=`echo $PV_LINE - 2 | bc`
                PV_NAME="`lsvg -p $VG | tail -$PV_NUMOFMEMBERS | awk '{print $1}' | xargs`"
                PV_STALENAME=`lsvg -p $VG | tail -$PV_NUMOFMEMBERS | grep -v active | awk '{print $1}' | xargs`

                if [ -z "$PV_NAME" ]
                then
                        $PV_NAME="NA"
                fi

                LV_LINE=`lsvg -l $VG | wc -l`
                LV_NUMOFMEMBERS=`echo $LV_LINE - 2 | bc`
                LV_NUMOFPROBLEM=`lsvg -l $VG | tail -$LV_NUMOFMEMBERS | grep -v "open/syncd" |  wc -l`
                LV_NUMOFOPEN=`echo $LV_NUMOFMEMBERS - $LV_NUMOFPROBLEM | bc`
                LV_PROBLEMNAME=`lsvg -l $VG | tail -$LV_NUMOFMEMBERS | grep -v "open/syncd" | awk '{print $1}' | xargs`
                LV_NAME="`lsvg -l $VG | tail -$LV_NUMOFMEMBERS | awk '{print $1}' | xargs`"

                if [ -z "$LV_NAME" ]
                then
                        $LV_NAME="NA"
                fi

                #printf "[DEBUG]The line is $VGLINE\n"

                if [ $VG_STALEPP -ge 1 -o $VG_STALEPV -ge 1 -o $LV_NUMOFPROBLEM -ge 1 ]
                then
                        printf "Volume Group: %s\nVolume Group Status: %s\n\n" $VG $VG_STATUS >> $NOTIFICATION_MSG

                        printf "Total PP Size (Mb) | Used PP Size (Mb) | Stale PP\n" >> $NOTIFICATION_MSG
                        printf "%-18d | %-17d | %-5d \n\n" $VG_TOTALPP $VG_USEDPP $VG_STALEPP >> $NOTIFICATION_MSG

                        printf "Total PV | Active PV | All PV members\n" >> $NOTIFICATION_MSG
                        printf "%-8d | %-9d | %-s\n\n" $VG_TOTALPV $VG_ACTIVEPV "$PV_NAME" >> $NOTIFICATION_MSG

                        printf "Total LV | Open LV | All LV members\n" >> $NOTIFICATION_MSG
                        printf "%-8d | %-7d | %-s\n\n" $LV_NUMOFMEMBERS $LV_NUMOFOPEN "$LV_NAME" >> $NOTIFICATION_MSG

                        if [ $VG_STALEPV -ge 1 ]
                        then
                                printf "Status of PV with problems:\n" >> $NOTIFICATION_MSG
                                for i in $PV_STALENAME
                                do
                                        THIS_PV=`lspv $i | grep "PV STATE" | awk '{print $1,$2,$3}'`
                                        printf "$i ($THIS_PV) \n" >> $NOTIFICATION_MSG
                                done
                        fi
                        printf "\n" >> $NOTIFICATION_MSG

                        if [ $LV_NUMOFPROBLEM -ge 1 ]
                        then
                                LV_FLAG=0
                                for i in $LV_PROBLEMNAME
                                do
                                        THIS_LV=`lslv $i | grep "LV STATE" | awk '{print $4,$5,$6}'`
                                        THIS_STATE=`lslv $i | grep "LV STATE" | awk '{print $6}'`
                                        THIS_BOOT=`lslv $i | grep "TYPE" | awk '{print $2}' | grep "boot" | wc -l`
                                        THIS_DUMP=`lslv $i | grep "TYPE" | awk '{print $2}' | grep "sysdump" | wc -l`
                                        if [ "$THIS_STATE" != "closed/syncd" ]
                                        then
                                                if [ $LV_FLAG = 0 ]
                                                then
                                                        printf "Status of LV with problems:\n" >> $NOTIFICATION_MSG
                                                        LV_FLAG=1
                                                fi
                                                printf "$i ($THIS_LV) \n" >> $NOTIFICATION_MSG
                                        elif [ "$THIS_STATE" != "open/syncd" -a $THIS_DUMP = "1" ]
                                        then
                                                if [ $LV_FLAG = 0 ]
                                                then
                                                        printf "Status of LV with problems:\n" >> $NOTIFICATION_MSG
                                                        LV_FLAG=1
                                                fi
                                                printf "$i ($THIS_LV) \n" >> $NOTIFICATION_MSG
                                        elif [ "$THIS_STATE" != "closed/syncd" -a $THIS_BOOT = "1" ]
                                        then
                                                if [ $LV_FLAG = 0 ]
                                                then
                                                        printf "Status of LV with problems:\n" >> $NOTIFICATION_MSG
                                                        LV_FLAG=1
                                                fi
                                                printf "$i ($THIS_LV) \n" >> $NOTIFICATION_MSG
                                        fi
                                done
                        fi
                        printf "\n\n" >> $NOTIFICATION_MSG
                        isFOUND=1
                        isERROR=1
                fi
        fi
done

if [ $isERROR == 0 ]
then
        printf "All Volume Groups are clean.\n" >> $NOTIFICATION_MSG
fi

if [ $isFOUND == 1 ]
then
        cat $NOTIFICATION_MSG | mailx -s "[`hostname`] LVM Errors" $EMAIL
fi

rm $OUTPUTFILE
rm $OUTPUTFILE.VG
rm $NOTIFICATION_MSG

How to check status of subgroup TCPIP in AIX

To get the status of the subsystem group TCPIP

# lssrc -g tcpip
Subsystem         Group            PID          Status
 inetd            tcpip            3997848      active
 hostmibd         tcpip            2752750      active
 aixmibd          tcpip            3539102      active
 xntpd            tcpip            9633868      active
 muxatmd          tcpip                         inoperative
 rwhod            tcpip                         inoperative
 snmpd            tcpip                         inoperative
 snmpmibd         tcpip                         inoperative
 dpid2            tcpip                         inoperative
 dhcpcd           tcpip                         inoperative
 dhcpcd6          tcpip                         inoperative
 ndpd-host        tcpip                         inoperative
 ndpd-router      tcpip                         inoperative
 tftpd            tcpip                         inoperative
 gated            tcpip                         inoperative
 named            tcpip                         inoperative
 routed           tcpip                         inoperative
 iptrace          tcpip                         inoperative
 timed            tcpip                         inoperative
 dhcpsd           tcpip                         inoperative
 dhcpsdv6         tcpip                         inoperative
 dhcprd           tcpip                         inoperative
 mrouted          tcpip                         inoperative
 pxed             tcpip                         inoperative
 binld            tcpip                         inoperative
 dfpd             tcpip                         inoperative
Easy?! :)

How to configure link aggregation in AIX

Link aggregation means you can give one IP address to two network cards and connect to two different switches for redundancy purpose. In this only one network card will be active in one time, and when it got failed the other network card goes active and let us continue our work.

It is better to use through SMIT. (For newbies like me :D )

# smit
then goto Devices > Communication > EtherChannel / IEEE 802.3ad Link Aggregation > Add An EtherChannel / Link Aggregation

Here select the network card that you want to use, ie active.

Eg: select ent0

IMP : then select Mode as 8023ad

then select backup adapter for redundancy.(press F4 to show N/W adapters.)

Eg: ent1

press enter.

now ent0 and ent1 got bonded.

then automatically a virtual adapter will be created named ent2.

then put IP address and all to this virtual adapter.
# smit
Communications Applications and Services > TCP/IP > Minimum Configuration & Startup

here select ent2 ( new bonded virtual adapter )

put IP Address and all,

give start now option.

Now you are successfully completed Link aggregation and check whether it works or not by removing the 2nd cable to the network card and check ping, then put the 2nd cable and remove 1st cable. 2 - 3 drops normally occurs.

How to configure NTP in AIX 7.1

First, verify that you have a server suitable for synchronisation. Do note that the offset must be less than 1000 seconds for xntpd to sync. If the offset is more than 1000 seconds, change the time manually on the client and try again. Ensure that firewall is opened for port 123.

# ntpq -d 10.10.10.254
ntpq> ^C

# ntpdate -d 10.10.10.254
23 Feb 14:41:24 ntpdate[4653300]: 3.4y
transmit(10.10.10.254)
receive(10.10.10.254)
transmit(10.10.10.254)
receive(10.10.10.254)
transmit(10.10.10.254)
receive(10.10.10.254)
transmit(10.10.10.254)
receive(10.10.10.254)
transmit(10.10.10.254)
server 10.10.10.254, port 123
stratum 2, precision -18, leap 00, trust 000
refid [10.10.10.250], delay 0.02632, dispersion 0.00002
transmitted 4, in filter 4
reference time:      d10f27e7.15e9bbb7  Wed, Feb 23 2011 14:29:59.085
originate timestamp: d10f2a94.f62731bd  Wed, Feb 23 2011 14:41:24.961
transmit timestamp:  d10f2a94.f62e7000  Wed, Feb 23 2011 14:41:24.961
filter delay:  0.02638  0.02632  0.02637  0.02637
               0.00000  0.00000  0.00000  0.00000
filter offset: -0.00046 -0.00047 -0.00050 -0.00051
               0.000000 0.000000 0.000000 0.000000
delay 0.02632, dispersion 0.00002
offset -0.000479

23 Feb 14:41:24 ntpdate[4653300]: adjust time server 10.10.10.254 offset -0.000479
Now specify xntp server in /etc/ntp.conf. Add in the line "server ip-address-of-the-server prefer". Leave the driftfile and tracefile at their defaults.
vi /etc/ntp.conf
Now, ensure xntpd is started after every reboot.
vi /etc/rc.tcpip
Uncomment the following line
start /usr/sbin/xntpd "$src_running"
Then, start the client and verify.
# lssrc -s xntpd
Subsystem         Group            PID          Status
 xntpd            tcpip                         inoperative
startsrc -s xntpd
0513-059 The xntpd Subsystem has been started. Subsystem PID is 4391134.
#lssrc -s xntpd
Subsystem         Group            PID          Status
 xntpd            tcpip            4391134      active
# ntpq -p
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*10.10.10.254   10.10.10.250      2 u   25   64  377     0.87    0.266    0.02
For AIX NTP server, Firstly, verify that you have a suitable NTP server. "Sys peer" should show a valid server or 127.127.1.0. Otherwise, if the server is "insane", you will need to ad a server line to /etc/ntp.conf and restart xntpd.
# lssrc -ls xntpd
 Program name:    /usr/sbin/xntpd
 Version:         3
 Leap indicator:  00 (No leap second today.)
 Sys peer:        10.10.10.254
 Sys stratum:     3
 Sys precision:   -18
 Debug/Tracing:   DISABLED
 Root distance:   0.002701
 Root dispersion: 0.001129
 Reference ID:    10.10.10.254
 Reference time:  d10f3206.cca44000  Wed, Feb 23 2011 15:13:10.799
 Broadcast delay: 0.003906 (sec)
 Auth delay:      0.000122 (sec)
 System flags:    bclient pll monitor filegen
 System uptime:   2653 (sec)
 Clock stability: 1.338257 (sec)
 Clock frequency: 0.000000 (sec)
 Peer: 10.10.10.254
      flags: (configured)(sys peer)(preferred)
      stratum:  2, version: 3
      our mode: client, his mode: server
Subsystem         Group            PID          Status
 xntpd            tcpip            4391134      active
To add server line into /etc/ntp.conf
vi /etc/ntp.conf
Add the following ilne and ensure "broadcast client" is commented out.
server 127.127.1.0
Restart xntpd
stopsrc -s xntpd
startsrc -s xntpd
If the server runs databases, use the -x flag to prevent the clock from changing in a negative direction. Add into /etc/rc.tcpip if necessary. Remember to use the double quote as in "-x". The whole process may take up to 12 minutes.

How to configure RSA 2FA Authentication for AIX

This guide will record the way i have installed RSA's PAM Agent v7.0.0.484.10_12_10_05_06_01 on AIX 6.1 and AIX 7.1

Prepare the System

The RSA's PAM Agent requires the following
  • at least AIX 6.1 TL5 (SP2)
  • RSA Authentication v6.1.2, 7.1 SP2 or 7.1 SP3
  • sdconf.rec file from the RSA Authentication Manager and store it at /var/ace on the server.
The following tools are supported
  • telnet
  • login
  • rlogin
  • su
  • ssh, sftp, scp
  • sudo (at least v1.7.3)
You may have a 64bit OS, but as only 32-bit PAM agent binaries are available, therefore only 32-bit tools are supported. Configuration of Login control Enable PAM Authentication in AIX. Change the authentication method to PAM in /etc/security/login.cfg
usw:
        shells = ...  ...
        maxlogins = 32767
        logintimeout = 60
        maxroles = 8
        *auth_type = STD_AUTH
        auth_type = PAM_AUTH
The symbol * is used to comment the whole line as opposed to the usual # symbol.

Installation of PAM Agent

Go to the path where the PAM agent installer resides.
# tar -xvf PAM-Agent_v7.0.0.484.10_12_10_05_06_01.tar
# cd PAM-Agent_v7.0.0.484.10_12_10_05_06_01
# ./install_pam.sh
Provide the correct path to sdconf.rec and press For subsequent installation prompts, press to accept the default value, or enter appropriate value. Do check the "VAR_ACE" variable in /etc/sd_pam.conf file that it points to the correct location for sdconf.rec. Permission for sdconf.rec should be 600 and ownership root:root.

Configuration of PAM Control

Configure PAM to authenticate using BOTH the local PAM and RSA.
bash-3.2# grep sshd /etc/pam.conf 
sshd    auth    sufficient        pam_securid.so 
sshd    auth    required        pam_aix 
sshd    account sufficient        pam_securid.so 
sshd    account required        pam_aix 
sshd    password  sufficient      pam_securid.so 
sshd    password  required      pam_aix
sshd    session sufficient        pam_securid.so
sshd    session required        pam_aix

Configuration of RSA PAM Agent

Configure the RSA PAM Agent using group control and enable logging of the authentication at /etc/sd_pam.conf
Set "RSATRACELEVEL=1" for logging.
Set "RSATRACEDEST=/var/log/rsa_authlog" for the file to log to.
Set "ENABLE_GROUP_SUPPORT=1" to enable group support
Set "PAM_IGNORE_SUPPORT_FOR_USERS=0" to authenticate by UNIX if a user is not securid authenticated due to user exclusion support.
Set "INCL_EXCL_GROUPS=1" to prompt for securid authentication for the listed group
Set "LIST_OF_GROUPS=other:wheel:staff" for list of group
Set "PAM_IGNORE_SUPPORT=1" to authenticate by UNIX if a user is not securid authenticated due to their group membership.
Set "AUTH_CHALLENGE_PASSWORD_STR=Enter your UNIX PASSWORD :" to be clearer in asking for account password.
Enable logging of all RSA login via syslogd at /etc/syslog.conf
# AUTHENTICATION LOG
auth.info               /var/log/authlog rotate files 12 time 30d compress
Refresh syslogd to take effect
 # refresh -s syslogd

Configuration of SSHD

Edit the sshd configuration file at /etc/ssh/sshd_config.
Set "UsePAM yes" to use PAM authentication
Set "PasswordAuthentication no" to disable password authentication. We have set in pam.conf to authenticate.
Set "ChallengeResponseAuthentication yes"
Set "UsePrivilegeSeparation no"
Refresh the SSHD by restarting it.
# stopsrc -s sshd; startsrc -s sshd
# lssrc -s sshd
Please make sure you back up before changing any configuration files. You have been warned. If you messed up your AIX host, login through HMC and open up a terminal console.

Test

Communication with RSA server test
bash-3.2# /opt/pam/bin/32bit/acestatus

RSA ACE/Server Limits
---------------------
        Configuration Version : 14      Client Retries : 5
        Client Timeout : 5              DES Enabled : Yes

RSA ACE/Static Information
--------------------------
        Service : securid       Protocol : udp  Port Number : 5500

RSA ACE/Dynamic Information
---------------------------
        Server Release : 7.1.2.0        Communication : 5

RSA ACE/Server List
-------------------
        Server Name :           sec-server.com
        Server Address :        10.10.10.22
        Server Active Address : 10.10.10.22
        Master : Yes    Slave : No      Primary : Yes
        Usage : Available for Authentications
------------------------------------------------------------------------------
        Server Name :           sec-server2.com
        Server Address :        10.10.10.23
        Server Active Address : 10.10.10.23
        Master : No     Slave : No      Primary : No
        Usage : Available for Authentications
Basic RSA test using foo account
bash-3.2# /opt/pam/bin/32bit/acetest
Enter USERNAME: foo \
Enter PASSCODE:
Authentication successful.
RSA test using ssh protocol for account with membership to groups "wheel" or "staff"
~$ssh foo@myserver
Enter PASSCODE:
Last unsuccessful login: Tue May 10 14:09:32 SGT 2011 on ssh from 10.10.10.2
Last login: Tue May 10 14:17:54 SGT 2011 on /dev/pts/1 from 10.10.10.2
...
...
$ ^D
Connection to myserver closed.
Accounts that do not have membership to groups "wheel" or "staff" are authenticated "normally"
~$ssh appacct@myserver
Enter your UNIX PASSWORD:
Last unsuccessful login: Wed May 25 10:42:05 SGT 2011 on ssh from 10.10.10.2
Last login: Wed May 25 10:44:03 SGT 2011 on /dev/pts/2 from 10.10.10.2
…
…
When your PIN expires, RSA server will prompt you to change.
bash-3.2# ssh foo@myserver
Enter PASSCODE:
To continue you must enter a new PIN.
Are you ready to enter a new PIN? (y/n) [n]: y
Enter a new PIN of 8 alphanumeric characters:
Re-enter new PIN to confirm:
New PIN accepted, press enter to continue.
Enter PASSCODE:
Enter PASSCODE:
1 unsuccessful login attempt since last login.
Last unsuccessful login: Wed May 25 10:09:17 SGT 2011 on ssh
Last login: Wed May 25 10:07:58 SGT 2011 on /dev/pts/1 from 10.10.10.2
...
...

How to configure sendmail in AIX

This guide serve as a record on how i configured sendmail in AIX 7.1 to relay mails to my company's exchange.

Get sendmail running first

# lssrc -s sendmail
# ps -aef | grep sendmail
# startsrc -s sendmail -a "-bd -q30m"
# ps -aef | grep sendmail
You should see the following output
root  5704     1   0 11:08:42      -  0:00 sendmail: accepting connections on port 25
Ensure sendmail runs after a reboot. Edit the '/etc/rc.tcpip' file
# vi /etc/rc.tcpip
uncomment the following startup line in ‘/etc/rc.tcpip’
start /usr/lib/sendmail "$src_running" "-bd -q${qpi}"
Get sendmail to use local hosts for name resolution instead of DNS (default) Edit ‘/etc/netsvc.conf’ file
# vi /etc/netsvc.conf
Add the following:
hosts=local
For security purpose, change the permissions on the ‘/etc/netsvc.conf’ file to lock down root only access:
# chmod 600 /etc/netsvc.conf
Add into the ‘/etc/hosts’ file the IP address and hostname of the Exchange server. use telnet to verify network.
# telnet  25
Add the IP and hostname into ‘/etc/hosts’
# vi /etc/hosts
The email daemon requires FQDN, so say your host is "myserver", append "myserver." to your hostname in /etc/hosts. Backup the original ‘/etc/sendmail.cf’ file and edit.
# cp /etc/sendmail.cf /etc/sendmail.cf.
Make the following changes:
# vi /etc/sendmail.cf
For AIX 6.1, i changed the Dw as well. Don't need it in AIX 7.1. Change from:
#DwYourHostName
To:
Dw
Change from:
# "Smart" relay host (may be null)
# Relay host to forward outgoing mail not in the local domain to.
# To forward ALL mail to this relay host, uncomment the appropriate
# rule in ruleset 0, as indicated by the ruleset's comments.
#DSmailer:relayhostname
DS
To:
# "Smart" relay host (may be null)
# Relay host to forward outgoing mail not in the local domain to.
# To forward ALL mail to this relay host, uncomment the appropriate
# rule in ruleset 0, as indicated by the ruleset's comments.
#DSmailer:relayhostname
DS
Now refresh the ‘sendmail’ daemon with the new changes
# refresh –s sendmail
It will take a few minutes for the ‘ps’ process to return:
root  5704     1   0 11:08:42      -  0:00 sendmail: accepting connections on port 25
In the interim period the following will be displayed via ‘ps’:
root  5704     1   0 11:08:42      -  0:00 /usr/lib/sendmail –bd –q30
Test sendmail now.
# echo “test” |sendmail –v yourname@myinbox.com
Enable mail log by adding the following line in syslog.conf.
mail.debug     /var/log/maillog rotate files 6 time 30d compress
Now, create the log file and refresh syslogd.
# touch /var/log/maillog
# refresh -s syslogd

How to determine disk space in AIX

As i'm still pretty new to AIX, i need to learn as quickly as possible to manage the new babies in my care. :) Let say i want to determine disk size of physical volume The output below is in Mb.

# bootinfo -s hdisk2
51200
For free space left in volume group We can see that there are 36Gb of space left in VG unassigned.
# lsvg rootvg
VOLUME GROUP:       rootvg                   VG IDENTIFIER:  00f603d800004c000000012c441dd962
VG STATE:           active                   PP SIZE:        32 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      1918 (61376 megabytes)
MAX LVs:            256                      FREE PPs:       1140 (36480 megabytes)
LVs:                13                       USED PPs:       778 (24896 megabytes)
OPEN LVs:           12                       QUORUM:         1 (Disabled)
TOTAL PVs:          2                        VG DESCRIPTORS: 3
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         2                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
PV RESTRICTION:     none
This is the demo script i have written to check both disk and inode utilisation. Please let me know if you find this useful or the gurus want to help me improve. :)
#!/bin/ksh
#
# FILENAME   : checkDiskSpace.ksh
# AUTHOR     : Victor Kwan
# EMAIL      : victorkk [AT] gmail [DOT] com
# PURPOSE    : To check disk and inode utilisation
#            : and alert sys admin if threshold is breached.
# DATE       : Feb 2011
#

# 
# Parameters setup
OUTPUTFILE_DF="checkDisk.DF.`hostname`.`date '+%d%b%Y'`.output"
NOTIFICATION_MSG="checkDisk.DF.`hostname`.`date '+%d%b%Y'`.message"
isFOUND=0

#
# Input Validation
if [ $# -ne 3 ]
then
        printf "Usage: \n\t$0   \n\n"
        exit
fi

DISK_THRESHOLD=$1
INODE_THRESHOLD=$2
EMAIL="$3"

#
# Check for Logical Disk Partition full
df -m > ./$OUTPUTFILE_DF

#
# Setup the messages
printf "----------------------------------------\n" > $NOTIFICATION_MSG
printf " Check for Logical Parition Utilisation\n\n" >> $NOTIFICATION_MSG
printf " Disk / Inode Threshold: $DISK_THRESHOLD | $INODE_THRESHOLD \n" >> $NOTIFICATION_MSG
printf "----------------------------------------\n\n" >> $NOTIFICATION_MSG


if [ `cat $OUTPUTFILE_DF | wc -l` == 1 ]
then
        printf "No Partition detected.\n"
else
        cat $OUTPUTFILE_DF | while read LINE
        do
                isFULL_CHECK=`echo $LINE | awk '{print $4}' | grep "%" | wc -l`
                #printf "[DEBUG]isFULL_CHECK is %d.\n" $isFULL_CHECK

                if [ $isFULL_CHECK == 1 ]
                then

                        LDISK=`echo $LINE | awk '{print $1}'`
                        LDISK_PARTITION=`echo $LINE | awk '{print $7}'`
                        LDISK_SIZE=`echo $LINE | awk '{print $2}'`
                        LDISK_USED=`echo $LINE | awk '{print $4}' | awk -F% '{print $1}'`
                        LDISK_INODE=`echo $LINE | awk '{print $6}' | awk -F% '{print $1}'`

                        if [ $LDISK_USED -ge $DISK_THRESHOLD -o $LDISK_INODE -ge $INODE_THRESHOLD ]
                        then
                                #printf "[DEBUG]The line is $LINE\n"
                                printf "Partition \"%s\" mounted on \"%s\"\n" $LDISK_PARTITION $LDISK >> $NOTIFICATION_MSG
                                printf "Size (Mb) | Disk (%%) | Inode (%%)\n"  >> $NOTIFICATION_MSG
                                printf "%-9.2f | %-8d | %-8d\n\n" $LDISK_SIZE $LDISK_USED $LDISK_INODE >> $NOTIFICATION_MSG
                                isFOUND=1
                        fi
                fi

        done
fi

if [ $isFOUND == 0 ]
then
        printf "All Partition within threshold.\n" >> $NOTIFICATION_MSG

elif [ $isFOUND == 1 ]
then
        cat $NOTIFICATION_MSG | mailx -s "[`hostname`] Disk Space Above Threshold" $EMAIL
fi

#
# Some Housekeeping
rm $OUTPUTFILE_DF
rm $NOTIFICATION_MSG

How to determine if the new oraerror.dat is loaded in OracleAgent

Recently, i need to change the behaviour of VCS how it managed my DB. To cut the story short, i wanted to check if the oraerror.dat file is loaded by the Oracle Agent so that monitoring behaviour is according to the oraerror.dat file. If the oraerror.dat is not loaded, the agent will assume the default behaviour after default monitoring (which is FAILOVER). Query the number of entries.

# egrep -v '^$|^#|}' /opt/VRTSagents/ha/bin/Oracle/oraerror.dat | wc -l
187
Check the gcore of the OracleAgent process, the data structure "token_data" stores the in-memory oraerror.dat.
# ps -ef |grep OracleAge
    root  5282     1  0   Aug 26 ?        5:10 /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h
    root 23533 29154  0 10:33:32 pts/3    0:00 grep OracleAge
# gcore 5282
gcore: core.5282 dumped
Take a look at the 7th field, which is the number of entries in oraerror.dat plus 1.
# mdb /opt/VRTSagents/ha/bin/Oracle/OracleAgent core.5282
Loading modules: [ libthread.so.1 libc.so.1 ld.so.1 ]
> *token_data/20D
0x97000:        32              663408          0               690328          690424          629608
                188             3               1162824517      1330332928      97              1111573570
                1112492800      80              436440          2               262144          19800
                0               0
ctrl+D to exit. Both the 7th field and the number of entries of oraerror.dat matches. Also look at the permission of the file! If due to some reasons, the oraerror.dat file not loaded by the OracleAgent when it started, check that the file is readable and contains the valid entries, then restart the Oracle Agent.
# ls -l oraerror.dat
-rwxr--r--   1 root     sys         3611 Jul 15  2009 oraerror.dat
# haagent -stop Oracle -force -sys myserver
# haagent -start Oracle -sys myserver

# haagent -display | grep Oracle
Oracle        AgentDirectory /opt/VRTSagents/ha/bin/Oracle
Oracle        AgentFile
Oracle        Faults         0
Oracle        Running        Yes
Oracle        Started        Yes
Please noted that if oraerror.dat is modified, the agent has to be restarted in order to have the agent recognise the change. Reference: http://www.symantec.com/docs/TECH155872

How to disable weak ciphers in Apache

Rationale for disabling weak ciphers

From the whitepaper in the NLUUG autumn "security" conference in Nov 2010, Some Apache configuration updates are required to satisfy the following that needs to be disabled.
Note: This is from the NLUUG whitepaper, if the author or publisher do not agree with me posting the sreenshot, please let me know immediately and i will removed it. Thanks. The current apache default setting in my Apache ssl.conf is SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP . The older servers are even using the older openssl! The enabled ciphers are listed below.
# openssl ciphers -v 'SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP'
DHE-RSA-AES256-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA1
DHE-DSS-AES256-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(256)  Mac=SHA1
AES256-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(256)  Mac=SHA1
KRB5-DES-CBC3-MD5       SSLv3 Kx=KRB5     Au=KRB5 Enc=3DES(168) Mac=MD5
KRB5-DES-CBC3-SHA       SSLv3 Kx=KRB5     Au=KRB5 Enc=3DES(168) Mac=SHA1
EDH-RSA-DES-CBC3-SHA    SSLv3 Kx=DH       Au=RSA  Enc=3DES(168) Mac=SHA1
EDH-DSS-DES-CBC3-SHA    SSLv3 Kx=DH       Au=DSS  Enc=3DES(168) Mac=SHA1
DES-CBC3-SHA            SSLv3 Kx=RSA      Au=RSA  Enc=3DES(168) Mac=SHA1
DHE-RSA-AES128-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(128)  Mac=SHA1
DHE-DSS-AES128-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(128)  Mac=SHA1
AES128-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(128)  Mac=SHA1
DHE-DSS-RC4-SHA         SSLv3 Kx=DH       Au=DSS  Enc=RC4(128)  Mac=SHA1
KRB5-RC4-MD5            SSLv3 Kx=KRB5     Au=KRB5 Enc=RC4(128)  Mac=MD5
KRB5-RC4-SHA            SSLv3 Kx=KRB5     Au=KRB5 Enc=RC4(128)  Mac=SHA1
RC4-SHA                 SSLv3 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=SHA1
RC4-MD5                 SSLv3 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=MD5
KRB5-DES-CBC-MD5        SSLv3 Kx=KRB5     Au=KRB5 Enc=DES(56)   Mac=MD5
KRB5-DES-CBC-SHA        SSLv3 Kx=KRB5     Au=KRB5 Enc=DES(56)   Mac=SHA1
EDH-RSA-DES-CBC-SHA     SSLv3 Kx=DH       Au=RSA  Enc=DES(56)   Mac=SHA1
EDH-DSS-DES-CBC-SHA     SSLv3 Kx=DH       Au=DSS  Enc=DES(56)   Mac=SHA1
DES-CBC-SHA             SSLv3 Kx=RSA      Au=RSA  Enc=DES(56)   Mac=SHA1
DES-CBC3-MD5            SSLv2 Kx=RSA      Au=RSA  Enc=3DES(168) Mac=MD5
RC2-CBC-MD5             SSLv2 Kx=RSA      Au=RSA  Enc=RC2(128)  Mac=MD5
RC4-MD5                 SSLv2 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=MD5
RC4-64-MD5              SSLv2 Kx=RSA      Au=RSA  Enc=RC4(64)   Mac=MD5
DES-CBC-MD5             SSLv2 Kx=RSA      Au=RSA  Enc=DES(56)   Mac=MD5
EXP-KRB5-RC4-MD5        SSLv3 Kx=KRB5     Au=KRB5 Enc=RC4(40)   Mac=MD5  export
EXP-KRB5-RC2-CBC-MD5    SSLv3 Kx=KRB5     Au=KRB5 Enc=RC2(40)   Mac=MD5  export
EXP-KRB5-DES-CBC-MD5    SSLv3 Kx=KRB5     Au=KRB5 Enc=DES(40)   Mac=MD5  export
EXP-KRB5-RC4-SHA        SSLv3 Kx=KRB5     Au=KRB5 Enc=RC4(40)   Mac=SHA1 export
EXP-KRB5-RC2-CBC-SHA    SSLv3 Kx=KRB5     Au=KRB5 Enc=RC2(40)   Mac=SHA1 export
EXP-KRB5-DES-CBC-SHA    SSLv3 Kx=KRB5     Au=KRB5 Enc=DES(40)   Mac=SHA1 export
EXP-EDH-RSA-DES-CBC-SHA SSLv3 Kx=DH(512)  Au=RSA  Enc=DES(40)   Mac=SHA1 export
EXP-EDH-DSS-DES-CBC-SHA SSLv3 Kx=DH(512)  Au=DSS  Enc=DES(40)   Mac=SHA1 export
EXP-DES-CBC-SHA         SSLv3 Kx=RSA(512) Au=RSA  Enc=DES(40)   Mac=SHA1 export
EXP-RC2-CBC-MD5         SSLv3 Kx=RSA(512) Au=RSA  Enc=RC2(40)   Mac=MD5  export
EXP-RC4-MD5             SSLv3 Kx=RSA(512) Au=RSA  Enc=RC4(40)   Mac=MD5  export
EXP-RC2-CBC-MD5         SSLv2 Kx=RSA(512) Au=RSA  Enc=RC2(40)   Mac=MD5  export
EXP-RC4-MD5             SSLv2 Kx=RSA(512) Au=RSA  Enc=RC4(40)   Mac=MD5  export

Disabling of weak ciphers.

For the sake of better security but without compromising the user's experience, i will also use SSLv3 or TLSv1 instead of SSLv2. Futhermore, OptRenegotiate is disabled by default.

i changed the cipher suite to SSLCipherSuite ALL:!ADH:!SSLv2:!EXPORT56:!EXPORT40:!RC4:!DES:+HIGH:+MEDIUM:+EXP which does the following:

  • Disable all Anonymous DH key exchange
  • Disable all SSL v2 ciphers
  • Disable all 56-bit export ciphers
  • Disable all 40-bit export ciphers
  • Disable all RC4 ciphers
  • Disable all single DES ciphers
  • Enable all 3DES ciphers
  • Enable all 128 bit encryption
  • Enable all export ciphers
The final list of ciphers enabled now are listed below.
# openssl ciphers -v 'SSLCipherSuite ALL:!ADH:!SSLv2:!EXPORT56:!EXPORT40:!RC4:!DES:+HIGH:+MEDIUM:+EXP'
DHE-RSA-AES256-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA1
DHE-DSS-AES256-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(256)  Mac=SHA1
AES256-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(256)  Mac=SHA1
KRB5-DES-CBC3-MD5       SSLv3 Kx=KRB5     Au=KRB5 Enc=3DES(168) Mac=MD5
KRB5-DES-CBC3-SHA       SSLv3 Kx=KRB5     Au=KRB5 Enc=3DES(168) Mac=SHA1
EDH-RSA-DES-CBC3-SHA    SSLv3 Kx=DH       Au=RSA  Enc=3DES(168) Mac=SHA1
EDH-DSS-DES-CBC3-SHA    SSLv3 Kx=DH       Au=DSS  Enc=3DES(168) Mac=SHA1
DES-CBC3-SHA            SSLv3 Kx=RSA      Au=RSA  Enc=3DES(168) Mac=SHA1
DHE-RSA-AES128-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(128)  Mac=SHA1
DHE-DSS-AES128-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(128)  Mac=SHA1
AES128-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(128)  Mac=SHA1

After updating the configuration in Apache, it is recommended to test the configuration, i.e. at www.ssllabs.com. The top grade is A. Try testing your web configuration and see if you get A. :)

Reference: NLUUG Autumn Security Conference Whitepaper

How to disable first time password change in AIX

I'm new to AIX. Getting sick of AIX asking users for new password when you log in to the server for the first time ever after the setting the password using root. After some digging, this is due to the ADMCHD flag in /etc/security/password like below.

ahkow:
        password = Qpkm8APaNsdoI
        lastupdate = 1227154518
        flags = ADMCHG
Clear the ADMCHG flag of the user account with "pwdadm -c ".
> pwdadm -c ahkow
now the flag is gone.
ahkow:
        password = Qpkm8APaNsdoI
        lastupdate = 1227154518

Disabling TRACK / TRACE in Apache

Currently, Apache does not deny TRACE requests (per RFC2616) by default. Therefore, when an HTTP TRACE request is sent to a web server that supports it, that server will respond echoing the data that is passed to it, including any HTTP headers. By definition, HTTP TRACE method ask a web server to echo the contents of the request back to the client for debugging purposes. The complete request, including HTTP headers, is returned in the entity-body of a TRACE response. An example of the response of Apache when TRACE is enabled,

# telnet myserver.com 80
Trying 10.10.10.10...
Connected to myserver. (10.10.10.10).
Escape character is '^\]'.
TRACE / HTTP/1.0
Host: myserver.com
TestA: Hello
TestB: World\\

HTTP/1.1 200 OK
Date: Tue, 19 Jul 2011 10:31:38 GMT
Server: Apache
Connection: close
Content-Type: message/http

TRACE / HTTP/1.0
Host: myserver.com
TestA: Hello
TestB: World

Connection closed by foreign host.
The output in the 2nd and 3rd paragraph is actually the response from Apache with the exact data sent in the 1st paragraph. Status code of 200 indicate that TRACE request is allowed and hence this response. It is possible for the attackers prepare carefully crafted page to trick a browser on a user’s box to issue the TRACE request and after which pass on the cookies or authentication data to the attacker. TRACE requests can be disabled by making a change to the Apache server configuration. There are 2 methods to achieve this 1) Setting “TraceEnabled off” in httpd.conf This is only available for Apache 1.3.34, 2.0.55 and 2.2.x. 2) Using rewrite to deny TRACE request in all the vhost. This can be used universally in all the Apache versions. You can deny TRACE requests for both HTTP and HTTPS depending on your business system requirement and using either method depending on your Apache version. For example, using method 2 to disable TRACE support in Apache, here’s the example in the configuration file httpd.conf. For this purpose I have added additional lines to capture the log when the rules are invoked. After note: Apache version 1.3.34, 2.0.55 and 2.2.x and newer, please use option 1. The rest will use option 2 on all the pair.

    Servername myserver.com
    ErrorLog logs/myserver-error.log
    CustomLog logs/myserver-access.log common

    Block TRACE/TRACK XSS vector
    RewriteEngine On
    RewriteCond %{REQUEST_METHOD} \^TRAC(E\|K)*
    RewriteRule .\* - \[F\]*
    RewriteLogLevel 9
    RewriteLog logs/rewrite_log
 
After TRACE support is disabled, below is the rerun of the TRACE request.
# /usr/local/apache2/conf>telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^\]'.
TRACE / HTTP/1.0
Host: myserver.com
TestA: Hello
TestB: World

HTTP/1.1 403 Forbidden{*}
Date: Thu, 21 Jul 2011 08:06:02 GMT
Server: Apache
Content-Length: 202
Connection: close
Content-Type: text/html; charset=iso-8859-1

<\!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

403 Forbidden

Forbidden

You don't have permission to access / on this server. Connection to localhost closed by foreign host.
We can see that Apache now response with a 400 series status code which indicate client request was denied. Output of the log shown below when the rewrite rules are invoked.
10.10.10.10 - - [21/Jul/2011:17:09:34 +0800] [myserver.com/sid#552add28f8][rid#552b2f0bf8/initial] (2) init rewrite engine with requested uri /
10.10.10.10 - - [21/Jul/2011:17:09:34 +0800] [myserver.com/sid#552add28f8][rid#552b2f0bf8/initial] (3) applying pattern '.*' to uri '/'
10.10.10.10 - - [21/Jul/2011:17:09:34 +0800] [myserver.com/sid#552add28f8][rid#552b2f0bf8/initial] (4) RewriteCond: input='TRACE' pattern='^TRAC(E\|K)' => matched
10.10.10.10 - - [21/Jul/2011:17:09:34 +0800] [myserver.com/sid#552add28f8][rid#552b2f0bf8/initial] (2) forcing '/' to be forbidden
The examples so far shows only for HTTP, XST vulnerability can also be shown for HTTPS where we just use “openssl s_client --connect hostname:port” and using the same commands after the telnet command. Please note that this is not a vulnerability in TRACE, nor in Apache. This is more of a need to harden Apache not to divulge more information than it should.


Reference:
http://www.kb.cert.org/vuls/id/867593
http://www.apacheweek.com/issues/03-01-24

How To Allow Applications to connect to MQ6

On MQ v6, MCAUSER in SYSTEM.DEF.SVRCONN is blank by default, hence this would not allow anyone to connect to it.

This is an example of the error from the application when it couldn't connect even though LDAP setup is correct. Basic network connectivity test by telnet is OK.

2008-04-06 11:11:10,452 [main] ERROR - Hit exception at init JMSSubscriber, System reconnecting after [10] seconds .... com.psa.infra.messaging.PMException at com.psa.infra.messaging.PMSessionFactory.createConnection(PMSessionFactory.java:260) at com.psa.infra.messaging.PMSessionFactory.getConnection(PMSessionFactory.java:168) at com.psa.infra.messaging.PMSession.getPMJMSSession(PMSession.java:216) at com.psa.infra.messaging.PMSession.createHandler(PMSession.java:101) at com.psa.infra.messaging.PMSession.createHandler(PMSession.java:58) at com.psa.common.utility.jms.subscriber.JMSSubscriber.init(JMSSubscriber.java:95) at com.psa.common.utility.core.mesg.MessageTransfer.process(MessageTransfer.java:94) at com.psa.common.utility.core.mesg.MessageTransfer.main(MessageTransfer.java:160) Caused by: javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for 'png1comm2_ica:PSA.QUEUE.MANAGER' at com.ibm.mq.jms.services.ConfigEnvironment.newException(ConfigEnvironment.java:434) at com.ibm.mq.jms.MQQueueConnection.createQM(MQQueueConnection.java:478) at com.ibm.mq.jms.MQQueueConnection.(MQQueueConnection.java:182) at com.ibm.mq.jms.MQQueueConnectionFactory.createQueueConnection(MQQueueConnectionFactory.java:166) at com.psa.infra.messaging.PMSessionFactory.createConnection(PMSessionFactory.java:234) ... 7 more 2008-04-06 11:11:10,453 [main] INFO - -- Close JMS subscriber handler

To allow applications to connect, enter the following code after getting in 'runmqsc'.

> alter CHANNEL ('SYSTEM.DEF.SVRCONN') CHLTYPE(SVRCONN) MCAUSER('mqm')