Tuesday, April 29, 2008

Save an expired VxVM system

VxVM is normally installed with 30 days demo licence. After which if you reboot, only the partitions essential to booting up will be loaded. meaning the following partition will be disabled and not mounted. This also mean that without a reboot, your system will work as though VxVM has not expired.

/opt
/var
swapvol

My Experience with this was that i cant even use 'vi' as swapvol is not available. So Basically, there's nothing much you can do.

So, go to single user more. Check your Volume Manager License. Should see that your temp / demo license has expired.


# /usr/sbin/vxlicrep | more


So now we try to install the permanent license. Key in your license key when prompted.

# /usr/sbin/vxlicinst

VERITAS License Manager vxlicinst utility version 3.02.005
Copyright (C) 1996-2004 VERITAS Software Corp. All Rights reserved.

Enter your license key : XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XX

License key successfully installed for VERITAS Volume Manager


Note: The license keys can coexist on the same machine. ie. if you had installed a previous license, you can continue adding in new ones.
Note: Licences are stored in /etc/vx/licenses/lic

Tip:
If your system do not have the vxlicinst command, you do not have the VRTSvlic package. Try mounting a remote /opt/VRTS/ from a server in the same network segment.


Now you can check the status, you will see (shown only portion of the listing)

# vxprint -ht

v opt - DISABLED ACTIVE 94372224 ROUND - fsgen
pl opt-01 opt DISABLED ACTIVE 94372224 CONCAT - RW
sd rootdisk-05 opt-01 rootdisk 146798976 94372224 0 c1t0d0 ENA
pl opt-02 opt DISABLED ACTIVE 94372224 CONCAT - RW
sd rootmirror-05 opt-02 rootmirror 146798976 94372224 0 c1t1d0 ENA

v swapvol - DISABLED ACTIVE 31423488 ROUND - swap
pl swapvol-01 swapvol DISABLED ACTIVE 31423488 CONCAT - RW
sd rootdisk-02 swapvol-01 rootdisk 41945472 31423488 0 c1t0d0 ENA
pl swapvol-02 swapvol DISABLED ACTIVE 31423488 CONCAT - RW
sd rootmirror-02 swapvol-02 rootmirror 41945472 31423488 0 c1t1d0 ENA

v var - DISABLED ACTIVE 52447104 ROUND - fsgen
pl var-01 var DISABLED ACTIVE 52447104 CONCAT - RW
sd rootdisk-04 var-01 rootdisk 94351872 52447104 0 c1t0d0 ENA
pl var-02 var DISABLED ACTIVE 52447104 CONCAT - RW
sd rootmirror-04 var-02 rootmirror 94351872 52447104 0 c1t1d0 ENA


To save yourself (haha...) and the server, try the following on all the partition that is disabled,


# vxvol -g start
i.e. vxvol -g rootdg start opt
i.e. vxvol -g rootdg start var
i.e. vxvol -g rootdg start swapvol


The volumes should now be enabled.

vxprint -ht


Thats it. Reboot and check your system...

To clean up defunct process

Use this command preap .

Find out the environment the process is running in.

For the process 27480 for example, you can try the below to check out the environment the process is running in.


pargs -e 27480
envp[0]: PWD=/opt/oracle/product/9.2.0/dbs
envp[1]: TZ=Singapore
...
...


This is useful when used together with the following commands as well. Here you can find the full path of the process. Also the connections and sub processes that it has made.


/usr/ucb/ps -auxxw | grep 27480
lsof | grep 27480

"unable to qualify my own domain name" error

It is not so long ago that i saw this errors on my console..

Symptom 1:


/var/adm/messages
Mar 13 11:15:16 pnsgsit1gw1 sendmail[8420]: [ID 702911 mail.crit] My unqualified host name (pnsgsit1gw1.) unknown; sleeping for retry
Mar 13 11:15:16 pnsgsit1gw1 sendmail[8421]: [ID 702911 mail.crit] My unqualified host name (pnsgsit1gw1.) unknown; sleeping for retry
Mar 13 11:16:16 pnsgsit1gw1 sendmail[8421]: [ID 702911 mail.alert] unable to qualify my own domain name (pnsgsit1gw1.) -- using short name
Mar 13 11:16:16 pnsgsit1gw1 sendmail[8420]: [ID 702911 mail.alert] unable to qualify my own domain name (pnsgsit1gw1.) -- using short name

Symptom 2:
when you try to manually telnet localhost 25 or mailx, either you can send to a particular domain or the respond is very slow after entering the "mail from" command.

Resolution:
update /etc/hosts ( and /etc/inet/ipnodes if you are on solaris 10) to use the following.

root@abc:/>more /etc/hosts
# internet host table
#====================
127.0.0.1 localhost
10.100.127.105 abc. abc
10.100.63.32 abc-rsc


#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
10.100.127.105 abc. abc loghost

Description:
it turn out that in solaris 10, the OS will go through /etc/inet/ipnodes for IPv4 address before going to /etc/hosts. in this case, if you have ldap configure and ldap does not have the entry, OS will go straight to /etc/inet/ipnodes.

This also mean that if you change the host IP of the solaris 10 server, please change in /etc/inet/ipnodes as well otherwise you have conflict of IP addresses.

note that this is Solaris specific.

# man ipnodes
...
...
NOTES
IPv4 addresses can be defined in the ipnodes file or in the
hosts file. See hosts(4). The ipnodes file will be searched
for IPv4 addresses when using the getipnodebyname(3SOCKET)
API. If no matching IPv4 addresses are found in the ipnodes
file, then the hosts file will be searched. To prevent
delays in name resolution and to keep /etc/inet/ipnodes and
/etc/inet/hosts synchronized, IPv4 addresses defined in the
hosts file should be copied to the ipnodes file.
...
..

# more /etc/nsswitch.conf
...
...
# consult /etc "files" only if ldap is down.
hosts: ldap [NOTFOUND=continue] files
...
...
# Note that IPv4 addresses are searched for in all of the ipnodes databases
# before searching the hosts databases.
...
...


---------------------------
Addition.

Here are 2 commands that you can hitch on to check if the host has fully qualified name.

root:/>/usr/lib/mail/sh/check-hostname
Hostname abc OK: fully qualified as abc.

root:/>/usr/lib/mail/sh/check-permissions
No unsafe directories found.

"WARNING: add_spec: No major number for mpt" error

If you hit the following errors when you reboot the server, just as the server is almost ready. They will come in a large numbers repeatedly. The errors are usually on your console and /var/adm/messages.

"WARNING: add_spec: No major number for mpt"

You may want to try the solutions below. This is from sunsolve.. I have tried out solution 2 successfully.


Document Audience: SPECTRUM
Document ID: 74181
Title: Solaris[TM]: "WARNING: add_spec: No major number for mpt"
Update Date: Wed Sep 29 00:00:00 MDT 2004
Products: Solaris 8 Operating System, Solaris 9 Operating System
Technical Areas: Patch

--------------------------------------------------------------------------------


--------------------------------------------------------------------------------

Keyword(s):add_spec, mpt, major number

Problem Statement:

After installing a patch to update /etc/driver_classes with
an "mpt" entry (for example, 108528-xx with xx>21), the system may
generate the following WARNING messages at boot time:

SunOS Release 5.8 Version Generic_108528-29 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
WARNING: add_spec: No major number for mpt
WARNING: add_spec: No major number for mpt
WARNING: add_spec: No major number for mpt
WARNING: add_spec: No major number for mpt
[...]
WARNING: add_spec: No major number for mpt
WARNING: add_spec: No major number for mpt
WARNING: add_spec: No major number for mpt
configuring IPv4 interfaces: hme0.
configuring IPv6 interfaces: hme0.

The following message appears in the /var/sadm/patch//log file
when the entry has not been successfully added to the /etc/name_to_major
file:
SUNWcsr: failed to add mpt to /etc/name_to_major:
(mpt) already in use as a driver or alias.

Explanation:
============
The problem is due to a reference in /etc/driver_classes to the
mpt driver. The mpt driver isn't added to the system because the
/etc/driver_aliases file already had an entry with the mpt driver. Because
of this, the add_drv failed when trying to install the driver.

This is an example of the inconsistencies described in Bug ID 4939994,
"Inconsistency between name_to_major and driver_aliases."


Resolution:

To solve this problem, a script has been created to identify the
inconsistencies between /etc/driver_aliases and /etc/name_to_major.
This script can also add the missing entries if they are listed in a
"reference" name_to_major file.

Here is an example of the way to use the attached script:

# cksum InconsitencyFixTool.tar.gz
1091730651 4410 InconsitencyFixTool.tar.gz

# gzip -dc InconsitencyFixTool.tar.gz | tar -xvf -
x InconsitencyFixTool, 0 bytes, 0 tape blocks
x InconsitencyFixTool/README, 1364 bytes, 3 tape blocks
x InconsitencyFixTool/name_to_major.InconsitencyFix, 4928 bytes, 10 tape blocks
x InconsitencyFixTool/name_to_major.i386.5.8, 841 bytes, 2 tape blocks
x InconsitencyFixTool/name_to_major.i386.5.9, 889 bytes, 2 tape blocks
x InconsitencyFixTool/name_to_major.sparc.5.8, 1831 bytes, 4 tape blocks
x InconsitencyFixTool/name_to_major.sparc.5.9, 1815 bytes, 4 tape blocks

# cd InconsitencyFixTool
# ./name_to_major.InconsitencyFix
Saving original files into .
Inconsistency found between //etc/driver_aliases and //etc/name_to_major on
the following driver(s):
mpt

Add mpt to //etc/name_to_major ? [y] y
Adding the following devices to //etc/name_to_major :
mpt 215
#

Reboot the system.


Temporary Workaround:

If the above script cannot be used, there are two other ways to fix the
problem:

1. Remove the "mpt" lines from /etc/driver_classes and /etc/driver_aliases.

They should look like the following:
driver_aliases:mpt "pci1000,30"
driver_classes:mpt scsi

Next, install (or remove and reinstall) a patch updating all
those files [/etc/driver_aliases /etc/driver_classes
/etc/name_to_major]. For example, install the mpt patch 115275-01 (or
above) or a new kernel Update patch.

OR

2. Add the missing "mpt" entry in the /etc/name_to_major file
to correct the problem with the patch installation.

You must manually append the "mpt" entry at the end
of the /etc/name_to_major file as follows:

mpt XXX

Where XXX is the maximum+1 of the numbers already given to
the other drivers in this file. Separate "mpt" and the given number with
a space character.

For example:

# tail /etc/name_to_major
fasttrap 223
dmfe 224
todds1307 225
pool 226
zcons 227
ipf 228
pfil 229
ctsmc 230
bl 231
mpt 232

Reboot the system.



Additional Information:

History:
========
Two patches have been identified that can cause the problem:

108974-31(or greater): SunOS 5.8: dada, uata, dad, sd, ssd and scsi
drivers patch

OR

109885-14 SunOS 5.8: glm patch

Only on top of a kernel patch 108528-xx where xx <= 21.

Note: If you have a more recent version than -21 before installing
the above patches, the problem will not occur.

This problem can also exist on Solaris 9 - if S9 was installed with
an upgrade from a S8 system where the inconsistency existed.

Impact:
=======
The mpt driver would not be installed correctly and cannot be used.


Attachments:

76676 - InconsistencyFixTool.tar.gz

Creation of metadevice

Here are the steps i used for creation of Metadevices for n-way mirror.

Assuming c0t0d0 is already set up, and c0t1d0 is not. You might want to save a copy of the vtoc if you want to.


prtvtoc /dev/rdsk/c0t0d0s2 | fmthard –s - /dev/rdsk/c0t1d0s2


we need (0.5n+1) replicas for the recovery to work so we need minimum 3 state database replicas per disk. creating in the last slice.

metadb -f -a -c 3 c0t0d0s7
metadb -a -c 3 c0t1d0s7


Creating the mirror for the root disk

metainit -f d10 1 1 c0t0d0s0
metainit d20 1 1 c0t1d0s0
metainit d0 -m d10


Configure the system to boot the root filesystem from the metadevice, using the "metaroot" command. This will make the necessary changes to /etc/vfstab and /etc/system:

metaroot d0


You may want to get the name of what is now the raw root disk, in case we need it later:

ls -l /dev/rdsk/c0t1d0s0


creating a 1-way mirror for the other slices. use the "-f" option to force the creation for the first mirror. You can create subsequent mirror slices using this method..

metainit -f d13 1 1 c0t0d0s3
metainit d23 1 1 c0t1d0s3
metainit d3 -m d13


Please modify /etc/vfstab so that the system knows how to load the disks..

#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
/dev/md/dsk/d3 /dev/md/rdsk/d3 /opt/ ufs 2 yes logging
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 2 yes logging
/dev/md/dsk/d5 /dev/md/rdsk/d5 /usr/ ufs 2 yes logging


To be on the safe side, get the system to unlock file system and flush any UFS logging of the master file system before you reboot.

lockfs -fa
reboot


Now attach the secondary mirror to create 2-way mirror for s3, s4, s5

metattach d3 d23
metattach d4 d24
metattach d5 d25


After completion of the syncing process (you can monitor using metastat), you may want to reboot the server to check that all is well.

More reference:
http://bradthemad.org/tech/notes/disksuite_mirroring.php
http://www.sun.com/bigadmin/content/submitted/metadevices.html


---------------------------------------
Addition.

From Solaris 10 6/06 (Update 2) onwards, if you have a production server that root drive has not been mirrored and the root and swap take up the entire disk...

You may try to steal a few cylinders from swap to create the metadb but be in for a big surprise.. :) Solaris 10 does not allow you to do that... see below..


# swap -d /dev/dsk/c0t0d0s1
# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c0t0d0
1. c0t1d0
Specify disk (enter its number): 0
selecting c0t0d0
[disk formatted]
Warning: Current Disk has mounted partitions.
/dev/dsk/c0t0d0s0 is currently mounted on /. Please see umount(1M).
/dev/dsk/c0t0d0s1 is currently used by swap. Please see swap(1M).

format> par
partition> 1
Part Tag Flag Cylinders Size Blocks
1 swap wu 0 - 429 989.34MB (430/0/0) 2026160

Enter partition id tag[swap]:
Enter partition permission flags[wu]:
Enter new starting cyl[0]:
Enter partition size[2026160b, 430c, 429e, 989.34mb, 0.97gb]: 414c
partition> 7
Part Tag Flag Cylinders Size Blocks
7 unassigned wm 0 0 (0/0/0) 0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 414
Enter partition size[141360b, 30c, 443e, 69.02mb, 0.07gb]: 16c

partition> l
Cannot label disk when partitions are in use as described.


haha.. here's the workaround i found on the web...

1) Start format
2) select your new mirror disk, in this case c0t1d0
3) Partition it like your current root disk, except you really want to set it up with a smaller swap and a new slice 7 for metadb. Leave all other slices alone.

Here’s the interesting bit…. you write the VTOC from our mirror disk to the root disk, usually you’d do it the other way….

# prtvtoc -h /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2

That’s it, you now have two identical VTOCs and a slice 7 for metadb’s... I think SUN is trying to reduce the number of incident where we mess up our systems??

Taking IPMP offline for maintenance

Found this gem for use in Solaris 10 from some sun blog site..

Solaris IPMP (IP multipathing) allows the servers keep operating in the event that a network interface or switch were to fail. Periodically you may need to take IPMP managed interfaces offline but still need to keep the IP addresses attached to those interface up and operational.

Solaris now come with the if_mpadm utility, which provides a simple and straight forward way to take IPMP managed interfaces online and offline.

Prior to using the if_mpadm utility, it is useful to check the status of the interface you want to take online or offline. This can be done by running the ifconfig utility, and checking the status of the interface you are interested in taking online or offline (in this case ni0):


$ ifconfig -a

lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ni0: flags=201000843 mtu 1500 index 2
inet 192.168.1.5 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:45:e8:33:3c:97
ni1: flags=201000843 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:72:b6:d3:ee:35
lo0: flags=2002000849 mtu 8252 index 1
inet6 ::1/128

To take the interface ni0 offline for maintenance, the if_mpadm utility can be run with the “-d” option (take interface offline), and the name of the interface to take offline:

$ if_mpadm -d ni0


Once if_mpadm does it’s job, the interface will be in the OFFLINE state, and the IP addresses attached to that interface will have migrated to another device in the IPMP group:

$ ifconfig -a

lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ni0: flags=289000842 mtu 0 index 2
inet 0.0.0.0 netmask 0
groupname ipmp0
ether 0:45:e8:33:3c:97
ni1: flags=201000843 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:72:b6:d3:ee:35
ni1:1: flags=201000843 mtu 1500 index 3
inet 192.168.1.5 netmask ffffff00 broadcast 192.168.1.255
lo0: flags=2002000849 mtu 8252 index 1
inet6 ::1/128

After you finish your maintenance, you can use the if_mpadm “-r” option (bring interface online) to bring the interface online:

$ if_mpadm -r ni0

Once if_mpadm completes, you can use the ifconfig utility to verify the interface is back up, and the IP addresses have migrated back to the original adaptor (you can disable automatic failback by setting FAILBACK to no in /etc/default/mpathd):

$ ifconfig -a

lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ni0: flags=201000843 mtu 1500 index 2
inet 192.168.1.5 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:45:e8:33:3c:97
ni1: flags=201000843 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname ipmp0
ether 0:72:b6:d3:ee:35
lo0: flags=2002000849 mtu 8252 index 1
inet6 ::1/128


Check out http://blogs.sun.com/meem/date/20070425

Forward/Reverse Proxying

This post is actually a brain dump of what i know so far from reading the numerous 'guides' out there..

Forward Proxy (e.g. Webmail)

The term Forward meant in the direction from "outside world" to "inside world".
Term to use in /usr/local/apache2/conf/httpd.conf : --> ProxyPass

Scenario

1) User --> webmail.com

2) therefore --> actually the framework is as follows

Webmail.com/

3) This is used to control incoming access (with respect to the web server)


Reverse Proxy (e.g. Internal programs accessing resources outside the network)

The term used is --> ProxyPassReverse

1) This is to control internal Application to access Internet / Extranet.

2) Can use this to control the type of access and the type of traffic allowed.

Moving Large directories in Solaris

When moving or copying really really large directories on Solaris, you can sometimes run into trouble, especially when some of the files in those directories are larger than 8 Gigabytes.

One solution is to use a “ufsdump pipe to ufsrestore” command, but this might even had problems from time to time.

experimented and checked out from some forums, think the safer method is to use a “tar pipe to tar” command. Be aware, however, that on Solaris you will need to include the “E” flag if the directory you are copying contains files larger than 8 Gigabytes.

Here is an example. Simply “CD” into the directory you want to copy and execute the following command, replacing “/new/directory” with path to the new destination directory.

tar cpBEf - * | (cd /new/directory; tar xBEf -)

Solaris 10 Fibre Channel Management

On Solaris 10 Fibre Channel Management is easy, cos the storage foundation kit is now integrated into the base OS, name is ”fcinfo“.

”fcinfo” utility is available to view fibre channel connectivity information.
”fcinfo” is especially useful, since it provides a tool with the base Operating System to view HBA and connectivity information, include HBAs from Emulex, JNI and Qlogic.

Warning!! You should do so only when the server fibre links are online or offline, and never do it continually while you are disrupting the fibre link. else you may never be able to bring up the fibre link till you reboot the server..


# uname -a
SunOS 5.10

# fcinfo -V
fcinfo: Version 1.0
For more information, please see fcinfo(1M)

# fcinfo hba-port

HBA Port WWN: 210000e08b8f29bf
OS Device Name: /devices/pci@84,4000/fibre-channel@3:devctl
Manufacturer: QLogic Corporation
Model: QLA2340
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b8f29bf

HBA Port WWN: 10000000c9581765
OS Device Name: /dev/cfg/c3
Manufacturer: Emulex
Model: LP9802
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 20000000c9581765

...
...

Option “-l” is Lists the link error statistics information for the port

# fcinfo hba-port -l

HBA Port WWN: 210000e08b8f29bf
OS Device Name: /devices/pci@84,4000/fibre-channel@3:devctl
Manufacturer: QLogic Corporation
Model: QLA2340
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b8f29bf
Error: SendRLS failed for 210000e08b8f29bf

HBA Port WWN: 10000000c9581765
OS Device Name: /dev/cfg/c3
Manufacturer: Emulex
Model: LP9802
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 20000000c9581765
Link Error Statistics:
Link Failure Count: 1
Loss of Sync Count: 6
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 120
Invalid CRC Count: 0

HBA Port WWN: 10000000c9582596
OS Device Name: /dev/cfg/c4
Manufacturer: Emulex
Model: LP9802
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 20000000c9582596
Link Error Statistics:
Link Failure Count: 1
Loss of Sync Count: 6
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 8
Invalid CRC Count: 0

To check the remote port

# fcinfo remote-port -slp 2100001b320616d2
Remote Port WWN: 50060e8005626100
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 50060e8005626100
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
LUN: 0
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d0s2
LUN: 1
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d1s2
LUN: 2
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d2s2
LUN: 3
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d3s2
LUN: 4
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d4s2
LUN: 5
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d5s2
LUN: 6
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d6s2
LUN: 7
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c2t50060E8005626100d7s2

To check the details of a particular fibre especially during setup, use luxadm to match the fibre and its WWN.

root # luxadm -e dump_map /devices/pci@7,700000/SUNW,qlc@0/fp@0,0:devctl
Pos Port_ID Hard_Addr Port WWN Node WWN Type
0 80f00 0 50060e8005626110 50060e8005626110 0x0 (Disk device)
1 82400 0 2100001b320610cd 2000001b320610cd 0x1f (Unknown Type,Host Bus Adapter)

root # fcinfo remote-port -slp 2100001b320610cd
Remote Port WWN: 50060e8005626110
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 50060e8005626110
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
LUN: 0
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d0s2
LUN: 1
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d1s2
LUN: 2
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d2s2
LUN: 3
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d3s2
LUN: 4
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d4s2
LUN: 5
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d5s2
LUN: 6
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d6s2
LUN: 7
Vendor: HITACHI
Product: OPEN-V -SUN
OS Device Name: /dev/rdsk/c3t50060E8005626110d7s2

Check where a user is logged in and its MAC address

This windows command can find out who logs into a particular machine and it's MAC address.


nbstat.exe -a username

Using Telnet to test SMTP

We normally use telnet to open up a connection to the smtp server to send/retrieve the mails. Below is an example of the commands used..


telnet mail.domain 25
Trying ???.???.???.???...
Connected to mail.domain.
Escape character is '^]'.
220 mail.domain ESMTP Sendmail ?version-number?; ?date+time+gmtoffset?


be polite and give the smtp server a 'hello' although the mail server would take your word for it as of RFC822-RFC1123


HELO local.domain.name
250 mail.domain Hello local.domain.name [loc.al.i.p], pleased to meet you


lets send an email..

MAIL FROM: mail@domain.com
250 2.1.0 mail@domain.com... Sender ok
RCPT TO: mail@otherdomain.com
250 2.1.0 mail@otherdomain.com... Recipient ok


If it doesn't please see possible problems.

To start composing the message issue the command DATA

If you want a subject for your email type Subject:-type subject here- then press enter twice (these are needed to conform to RFC 882)

You may now proceed to type the body of your message

To tell the mail server that you have completed the message enter a single "." on a line on it's own.


.
250 2.0.0 ???????? Message accepted for delivery


You can close the connection by issuing the QUIT command.

quit
221 2.0.0 mail.domain.ext closing connection
Connection closed by foreign host.



Here are a list of problems that i have encountered before....

The domain that you are sending from must exist

501 nouser@nosuchplace.here... Sender domain must exist

A recipient has been specified before a sender.

503 Need MAIL before RCPT


The mail server has refused to relay mail for you, this may be for any number of reasons but typical resons include:
Not using this provider for an internet connection and/or
Not using an email address provided by the owner of the server.
ACL in the mail configuration.

550 mail@domain.ext... Relaying Denied

Solaris NIC speed and duplex settings

In solaris 10, we have this very useful command to check the state of the NIC..


# dladm show-dev
bge0 link: up speed: 1000 Mbps duplex: full
bge1 link: unknown speed: 0 Mbps duplex: unknown
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: unknown speed: 0 Mbps duplex: half
e1000g2 link: up speed: 1000 Mbps duplex: full
e1000g3 link: up speed: 100 Mbps duplex: full
bge2 link: up speed: 1000 Mbps duplex: full
bge3 link: up speed: 1000 Mbps duplex: full
e1000g4 link: up speed: 1000 Mbps duplex: full
e1000g5 link: unknown speed: 0 Mbps duplex: half

# dladm show-link
bge0 type: non-vlan mtu: 1500 device: bge0
bge1 type: non-vlan mtu: 1500 device: bge1
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
e1000g3 type: non-vlan mtu: 1500 device: e1000g3
bge2 type: non-vlan mtu: 1500 device: bge2
bge3 type: non-vlan mtu: 1500 device: bge3
e1000g4 type: non-vlan mtu: 1500 device: e1000g4
e1000g5 type: non-vlan mtu: 1500 device: e1000g5


Here's a few good places to check out!!

http://www.brandonhutchinson.com/Solaris_NIC_speed_and_duplex_settings.html
http://forum.java.sun.com/thread.jspa?threadID=5084843

Match User Identity

Here's a little snippet for check for the correct user id before your script goes on with its task.. I have used this so far in ksh scripts..


# Only the root user can run the ndd commands
if [ "`/usr/bin/id | /usr/bin/cut -c1-5`" != "uid=0" ] ; then
echo "You must be the root user to run `basename $0`."
exit 1
fi


Sometime, i would use this also..


# Only the root user can run the ndd commands
if [ "`/usr/ucb/whoami != "root" ] ; then
echo "You must be the root user to run `basename $0`."
exit 1
fi

Setting up SSL Certificate in MQ6

To provide more secure communication between MQ channels, we may need to put in the SSL protection. In brief, here's are the steps..

1) Create a key store (key.kdb is the default name)


# gsk7cmd -keydb -create -db /key.kdb -pw -type cms -expire -stash


** It is very IMPORTANT to stash the password, otherwise MQ will not know what password to use. The password is stash to key.sth in the same location.

2) Generate a certificate (CSR)

# gsk7cmd -certreq -create -db -pw -label -dn "" -size -file


^^ dn --> distinguished name. X.500 distinguished name enclosed in double quotes.
Note that only the CN attribute is required.
You can supply multiple OU attributes.

*** For MQ server, please use ibmwebspheremq'queue manager name' without the quotes. Do not try to "learn" from the hard way. All small letters and follow exactly. For MQ clients, please use ibmwebspheremq'userid' without the quotes.

3) Sent to CA to sign the certificate you have generated.

4) Add the certificate signed by CA to MQ6.

# gsk7cmd -cert -receive -file -db -pw -format ascii

[ Option -add --> add a CA cert so that the signer is trusted]
[ Option -receive --> receive a cert signed by a CA]

4a) You may want to check and display the certificate. Check that the Subject and Issuer is different.

# gsk7cmd -cert -list -db /key.kdb -pw
# gsk7cmd -cert -details -db /key.kdb -pw -label


5a) Alter MQ6 key location so that MQ is informed.

# runmqsc # ALTER QMGR SSLKEYR ('/ssl/')

i.e.

5b) Refresh the security setting in MQ server.

# refresh security type(ssl)


6) configure MQ6 channels that needs SSL.

# runmqsc
# alter chl('') chltyp(sdr) sslciph('')
# alter chl('') chltyp(rcvr) sslciph('')

e.g. Type of cipher -- TLS_RSA_WITH_AES_128_CBC_SHA


7) Restart Channel for the changes to take effect. Check the status to see if its successful.

# stop chl('')
# start chl('')
# dis chs(*)


Tip:
In order for the gsk7cmd to run properly, you will need to set the environment

# export JAVA_HOME=/opt/mqm/ssl/
# export PATH=$PATH:/bin:/usr/bin


Here are additional commands you can use..

For displaying of the certificates,

[Cert that you added in] gsk7cmd -cert -list personal -db key.kdb -pw xxxxxxx
[All Cert in the DB] gsk7cmd -cert -list -db key.kdb -pw xxxxxxx
[To show cert details] gsk7cmd -cert -details -db key.kdb -pw xxxxxxx -label ibmwebspheremqqmgrname
[Extract cert from DB] gsk7cmd -cert -extract -db key.kdb -pw xxxxxxx -label ibmwebspheremqqmgrname -target Cert.txt -format ascii
[To check cert validity] gsk7cmd -cert -list all -expiry 720 -db key.kdb -pw xxxxxxx

To import certificate,

[Import] gsk7cmd -cert -import -file ibmwebspheremq_qmgr.p12 -pw xxxxxxx -type pkcs12 -target key.kdb -target_pw yyyyyyy -target_type cms -label ibmwebspheremqqmgrname
[Import with label change] gsk7cmd -cert -import -file ibmwebspheremq_pqmgr.p12 -pw xxxxxxx -type pkcs12 -target key.kdb -target_pw yyyyyyy -target_type cms -label SSLcert_MQ6 -new_label ibmwebspheremqqmgrname

To Export certificate,

[Export to file] gsk7cmd -cert -export -db key.kdb -pw xxxxxxx -label SSLcert_MQ6 -type cms -target ibmwebspheremqqmgrname -target_pw xxxxxxx -target_type pkcs12

To delete certificate,

[Delete from db] gsk7cmd -cert -delete -db key.kdb -pw xxxxxxx -label ibmwebspheremqqmgrname

Reference for this installation...
http://www.ibm.com/developerworks/websphere/library/techarticles/0611_yue/0611_yue.html
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzas.doc/sy12350_.htm
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/index.jsp?topic=/com.ibm.mq.csqzas.doc/sy12340_.htm
http://www-1.ibm.com/support/docview.wss?uid=swg21113368
http://middleware.its.state.nc.us/middleware/Documentation/en_US/htm/csqzas00/csqzas001x.htm
http://hursleyonwmq.wordpress.com/tag/webspheremq/
http://hursleyonwmq.wordpress.com/2007/02/16/do-you-have-to-specify-an-ssl-certificate-label/

When was the Channel last used?

There are times when we need to check when was the channel last used. Perhaps to before a shutdown or etc...

Here's how..

 
dis chs(*) lstmsgda lstmsgti
7 : dis chs(*) lstmsgda lstmsgti
AMQ8417: Display Channel Status details.
CHANNEL(xxx) CHLTYPE(RCVR)
CONNAME(10.216.121.2) CURRENT
LSTMSGDA(2007-07-06) LSTMSGTI(11.36.42)
RQMNAME(xx1) STATUS(RUNNING)
SUBSTATE(RECEIVE) XMITQ( )
AMQ8417: Display Channel Status details.
CHANNEL(yyy) CHLTYPE(RCVR)
CONNAME(10.216.121.2) CURRENT
LSTMSGDA(2007-07-06) LSTMSGTI(11.36.42)
RQMNAME(xx1) STATUS(RUNNING)
SUBSTATE(RECEIVE) XMITQ( )
AMQ8417: Display Channel Status details.
CHANNEL(xxx) CHLTYPE(SDR)
CONNAME(name(1415)) CURRENT
LSTMSGDA(2007-07-06) LSTMSGTI(11.11.48)
RQMNAME(xx1) STATUS(RUNNING)
SUBSTATE(MQGET) XMITQ(xx.queue)
AMQ8417: Display Channel Status details.
CHANNEL(yyy) CHLTYPE(SDR)
CONNAME(name(1415)) CURRENT
LSTMSGDA(2007-07-06) LSTMSGTI(11.11.35)
RQMNAME(yy1) STATUS(RUNNING)
SUBSTATE(MQGET) XMITQ(yy.queue)

Troubleshooting AMQ5882 error

This error points towards a overflow of messages from the queues into the system dead letter queue.. Here's the symptom.


> tail -50 AMQERR01.LOG
----- amqccita.c : 3263 --------------------------------------------
08/02/07 21:38:48 - Process(20733.10) User(mqm) Program(amqfcxba)

AMQ5882: WebSphere MQ Publish/Subscribe broker has written a message to the
dead-letter queue.

EXPLANATION:
The broker has written a message to the dead-letter queue
(SYSTEM.DEAD.LETTER.QUEUE ) for reason
'2053:MQRC_Q_FULL'. Note. To save log space, after the first occurrence of this
message for stream SYSTEM.BROKER.DEFAULT.STREAM ), it will only be written periodically.
ACTION:
If the message was not deliberately written to the dead-letter queue,
for example by a message broker exit, determine why the message was written to the dead-letter queue,
and resolve the problem that is preventing the message from being sent to its destination.



Explanation -
The broker might be putting the subscriber's publications to the dead-letter queue.
(Proven because the DLQ is increasing in number)

There might be a problem with the subscriber's queue.
For example, it might be put-inhibited or the publications might be too large for the queue.
In this case the broker, by default, puts these messages to the dead-letter queue (DLQ).
Check the DLQ at the subscriber's broker. The broker also issues message AMQ5882 if
it has to put a message to the DLQ.



We check further..


1 : dis ql(*) curdepth
...
...
AMQ8409: Display Queue details.
QUEUE(yyy)
TYPE(QLOCAL) CURDEPTH(0)
AMQ8409: Display Queue details.
QUEUE(xxx)
TYPE(QLOCAL) CURDEPTH(5000)
AMQ8409: Display Queue details.
QUEUE(zzz)
TYPE(QLOCAL) CURDEPTH(0)
… … … …
… … … …


Notice that one of the queues has a very high CURDEPTH. So we investigate more.



dis ql('xxx')
1 : dis ql('SYSTEM.xxxx')
AMQ8409: Display Queue details.
QUEUE(xxx)
TYPE(QLOCAL) ACCTQ(QMGR)
ALTDATE(2007-04-09) ALTTIME(09.44.28)
BOQNAME( ) BOTHRESH(0)
CLUSNL( ) CLUSTER( )
CLWLPRTY(0) CLWLRANK(0)
CLWLUSEQ(LOCAL) CRDATE(2007-04-09)
CRTIME(09.44.28) CURDEPTH(5000)
DEFBIND(OPEN) DEFPRTY(0)
DEFPSIST(NO) DEFSOPT(SHARED)
DEFTYPE(PERMDYN)
DESCR(Websphere MQ - JMS Classes - Model queue)
DISTL(NO) GET(ENABLED)
HARDENBO INITQ( )
IPPROCS(0) MAXDEPTH(5000)
MAXMSGL(15728640) MONQ(QMGR)
MSGDLVSQ(PRIORITY) NOTRIGGER
NPMCLASS(NORMAL) OPPROCS(1)
PROCESS( ) PUT(ENABLED)
QDEPTHHI(80) QDEPTHLO(20)
QDPHIEV(DISABLED) QDPLOEV(DISABLED)
QDPMAXEV(ENABLED) QSVCIEV(NONE)
QSVCINT(999999999) RETINTVL(999999999)
SCOPE(QMGR) SHARE
STATQ(QMGR) TRIGDATA( )
TRIGDPTH(1) TRIGMPRI(0)
TRIGTYPE(FIRST) USAGE(NORMAL)


Hey, Notice that CURDEPTH = MAXDEPTH!! we have queue that is full!! From here, we should check if the receiving MQ is probably down or network issue.. it may be because the backend processes died or something.

Be sure to check the impact before proceeding to restart the receiving MQ channel or the backend processes otherwise you may cause more damages to the whole business flow especially if the messages are critical.

Message sequence number error

The error below is seen in MQ log.


09/09/07 04:07:08 - Process(18642.58) User(mqm) Program(amqrmppa)
AMQ9526: Message sequence number error for channel 'xxxxx'.

EXPLANATION:
The local and remote queue managers do not agree on the next message sequence
number. A message with sequence number 1484994 has been sent when sequence
number 1484986 was expected.
ACTION:
Determine the cause of the inconsistency. It could be that the synchronization
information has become damaged, or has been backed out to a previous version.
If the situation cannot be resolved, the sequence number can be manually reset
at the sending end of the channel using the RESET CHANNEL command.
----- amqrmtra.c : 3812 -------------------------------------------------------
09/09/07 04:07:08 - Process(18642.58) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'xxxxx' ended abnormally.
ACTION:
Look at previous error messages for channel program 'xxxxxx' in the
error files to determine the cause of the failure.


How do we resolve this? Here's how.


bash-2.05$ runmqsc QMGR
5724-H72 (C) Copyright IBM Corp. 1994, 2005. ALL RIGHTS RESERVED.
Starting MQSC for queue manager QMGR.



dis channel(xxxxx)
2 : dis channel(xxxxx)
reset channel(xxxxx) seqnum(1)
3 : reset channel(xxxxx)
AMQ8023: WebSphere MQ channel reset.
dis chs(xxxxx)
4 : dis chs(xxxxx)
stop channel(xxxxx)
5 : stop channel(xxxxx)
AMQ8019: Stop WebSphere MQ channel accepted.
dis chs(xxxxx)
6 : dis chs(xxxxx)
start channel(xxxxx)
7 : start channel(xxxxx)
AMQ8018: Start WebSphere MQ channel accepted.
dis chs(xxxxx)
8 : dis chs(xxxxx)