Sunday, September 11, 2011

How Disk Expansion on AIX can fail

IBM has been boasting about the ease of disk space management on its AIX hosts. Sure, we are able to expand the file system on AIX for many occasions. i.e. /opt, /home, /tmp and etc. However, trying on /opt/oracle/oradata failed. Let see why.

The story.

my oravg was full due to rapid expansion of business, hence needed more space. So no impact is expected as disk space expansion should be on the fly. I have personally tried on /tmp, / and /var, so i am pretty confident that this is OK.

Expansion was completed but from "df" command, we cant see the expected changes. Then the phone started ringing. Oracle DB has crashed.

Binary installation was still intact but the DB is gone.

Troubleshooting and recovery

fsck showed that the "bit allocation map" is corrupted. Tried to creat another VG and try to copy the LV over to the new VG, trying to see if the inodes, files, links and bit allocation map can be recreated. The map couldn't be recreated.

Searching in google did not yield any useful information other than creating snapshot to backup the system or apply snapshot to recover the system. So this is a provision for quick recovery rather than what i need.

It seems that oracle DB or IBM DB2 are "aware" of the disk physical boundaries and when the boundaries are changed, they do not know how to react and hence crashed. In addition, there was no space for the OS to process the file system expansion since it was 100% full. The next recovery step is to wipe the oravg clean and rebuild the DB.

The painful thing was that the building of DB took more than 3 days and this rebuild took another weekend. ouch.

The take away

Before doing any disk expansion for DB or on any other disk where the applications running off the disk are "aware" of the disk setup, please shut it down before doing any disk changes. In addition, should clear up some space so that the OS has some leeway to process the file system expansion on the fly.

I'm lucky that this is a trial setup.

So do not believe blindly that disk space management is on the fly without strings attached. There are bound to be some obscure conditions that will break it.

Murphy law - "Anything that can go wrong, will go wrong"

No comments: