Monday, July 16, 2012

How to Configure WebSphere Application Server hung thread detector to automatically produce javacores or thread dump

When ever Websphere Application Server has a thread that runs for a long time (600s), by default it will report with a WSVR0605W message. This is a similar behvaiour we see in weblogic. WSVR0605W: Thread has been active for Reading from the following source, it is possible to get Websphere Application Server to generate a javacore when a potentially hung thread is reported. On Solaris, we call this thread dump. Source : [http://www-01.ibm.com/support/docview.wss?uid=swg21448581] This core file can be helpful in troubleshooting server hangs and performance issues. If the jobs running in your system usually take a long time, you may want to tune the monitoring time to more than 600s else you might get false reports. The website described that property "com.ibm.websphere.threadmonitor.dump.java" should be enabled. h4. Steps to enable auto thread dump. Log in to administrative console, click Servers > Application Servers > server_name. Under Server Infrastructure, click Administration > Custom Properties. Click New. Add the following property: Name: com.ibm.websphere.threadmonitor.dump.java Value: true Click Apply. Click OK and save the configuration changes. Restart the Application Server for the changes to take effect. done. In case you want to manually trigger a thread dump, try kill -3 .

Wednesday, July 11, 2012

Restarting Application Server by Node Agent

Learnt that in websphere application server 7, by default, the node agent will not take any action when an application server fails. 

In order to get the node agent to monitor and automatically restart a failed application server instance, we must setup the monitoring policy for that application server.

Go to the deployment manager console, and do the following:

1 . –> Java and Process Management –> Monitoring Policy
2.  Check the “Automatic Restart” box
3.  In the “Node Restart State“, set the state to “STOPPED”

Whenever you have a failed or killed application servers, node agent will now auto-restart the application server. 

If the state is set to "RUNNING", not only will the node agent restart a failed or killed application server, it WILL ALSO auto start the application server upon a node agent restart. 

Tuesday, July 10, 2012

Resolving ADMR0104E for Application Server


This write up serve to record the resolution for the ADMR0104E error encountered by Websphere Application server during start up. The Application Server eventually is unable to start up.

From "SystemOut.log", we see that the system is unable to read some properties file.

[6/27/12 12:08:35:103 SGT] 00000000 FileDocument  E   ADMR0104E: The system is unable to read document cells/Cell01/nodes/Node01/node-metadata.properties: java.io.IOException: Permission denied
        at java.io.File.checkAndCreate(File.java:1715)
        at java.io.File.createTempFile(File.java:1803)
        at com.ibm.ws.management.repository.FileDocument.createTempFile(FileDocument.java:564)
        at com.ibm.ws.management.repository.FileDocument.read(FileDocument.java:500)
        at com.ibm.ws.management.repository.FileRepository.extractInternal(FileRepository.java:1134)


Some research and checks revealed that the permissions on the temp directory under the application server profile had been changed. The application server would then be no longer able to write to the temp directory for the node in the below directory.

# ls -ltr
total 0
drwxr-xr-x    3 root     system          256 Jun 27 11:54 download


The cause of this is the start up of the application server using root. That's the reason why the above temp directory is owned by root.

Potentially, you should check the ffdc directory as well.

# ls -l //AppSrv01/logs/ | grep ffdc
drwxr-xr-x    2 appusr   appgrp        49152 Jun 27 13:50 ffdc


Research from the internet, the directory owner and the process execution user should be in the same group and be at least of permission 774. TO be fail safe, change the ownership/group as required under //profiles/ and //profiles/.

Once the ownership is reverted back to "appusr", we should see the result as below.

# chown -R appusr:appgrp  download

# ls -ltr
total 0
drwxr-xr-x    3 appusr   appgrp          256 Jun 27 11:54 download


The Application server is able to start up now.

[6/27/12 12:21:56:692 SGT] 00000000 AdminTool     A   ADMU3000I: Server appsrv open for e-business; process id is 4128910{code}

We can also check the process execution of the application server in order to compare to the file system permissions, one can do the following:

1. Open the admin console
2. Open Servers –> Application Servers –>
3. Open Java Process Management –> Server Execution
4. Look for username and group of executing user

Thats all.

Monday, July 09, 2012

Recover websphere password

Google online and found this interesting step to recover websphere 7.1 password.

For encrypting the password we have,

//java/bin/java -Djava.ext.dirs=//deploytool/itp/plugins/com.ibm.websphere.v7_7.0.1.v20100710_0411/wasJars/ -cp securityimpl.jar:iwsorb.jar  com.ibm.ws.security.util.PasswordEncoder secret


The output is

decoded password == "secret", encoded password == "{xor}LDo8LTor"

Hence, you can use the same method to decrypt the encrypted password.

//java/bin/java -Djava.ext.dirs=//deploytool/itp/plugins/com.ibm.websphere.v7_7.0.1.v20100710_0411/wasJars/ -cp securityimpl.jar:iwsorb.jar  com.ibm.ws.security.util.PasswordDecoder {xor}LDo8LTor






The output is

encoded password == "{xor}LDo8LTor", decoded password == "secret"

If you want to know, you can update the password for the deployment manager and nodes without knowing the password. Check out /.../config/cells//security.xml. :)

Friday, July 06, 2012

Encrypting the ID and Password for Websphere Application Server

By default, you need to supply the ID and password when starting up/shutting down the deployment manager, node or application server. Example of the command as below

Deployment Manager
//bin/startManager.sh -username XXX -password XXX

Node
//bin/startNode.sh -username XXX -password XXX

Application Server
//bin/startServer.sh -username XXX -password XXX

The steps to encrypt the password and ID is as follows.

Insert the ID and password in clear text into the SOAP properties file at //properties/soap.client.props.

# grep SOAP.login soap.client.props | grep -v "#"
com.ibm.SOAP.loginUserid=wasadm
com.ibm.SOAP.loginPassword=wasadm
com.ibm.SOAP.loginSource=prompt


We use the IBM provided script to encode the password.

//bin/PropFilePasswordEncoder.sh //profiles/default/properties/soap.client.props com.ibm.SOAP.loginPassword -Backup

Taking a look at the same property file again, the password is now encrypted.

# grep SOAP.login soap.client.props | grep -v "#"
com.ibm.SOAP.loginUserid=wasadm
com.ibm.SOAP.loginPassword={xor}Es4zPjwS
com.ibm.SOAP.loginSource=prompt


Now, we can start up websphere and shut down without using the password.

su wasadm -c "//bin/startManager.sh"
su wasadm -c "//bin/startNode.sh"
su wasadm -c "//bin/startServer.sh "



su wasadm -c "//bin/stopManager.sh"
su wasadm -c "//bin/stopNode.sh"
su wasadm -c "//bin/stopServer.sh "

end.

======================

Some trival.
How come IBM prefers to use XOR instead of some stronger algorithm like how weblogic uses 3DES? XOR is good enough only to prevent casual snooping. 


Someone demonstrated that with a online decoder

http://www.poweredbywebsphere.com/decoder.html

Thursday, July 05, 2012

Change WebSphere Ports without Reinstalling

Scenario: you have WebSphere Application Server 7.1 installed as ND. If the cells are using default ports on the same host and you want to access the different cells concurrently, you may want to change the ports on one of the cell.

1.  Go to the master config repository for the server ports (Dmgr profiles directory)

2.  Backup the current serverindex.xml

3.  Edit each of the ports in this file. (Dmgr will use the new ports)

4.  Repeat this process for all nodes in the master repository (Node profiles directories)

5.  For all cells,

6.  Backup virtualhosts.xml

7.  edit all the ports.  (nodes will use this ports to connect with Dmgr)

7.  Start the dmgr (startServer.sh)

8.  For each node,  executue a syncNode so that nodes get their new port assignments from the master repository

//bin/syncNode.sh

Use the new SOAP ports used in step #3.

9.  Start up each node

10.  Start up each application server.

Confirm which new ports you want to use before you start.
To make it easier to remember, maybe instead of the usual 80, can try prepending like 9080, 19080, etc.


Done.

Monday, July 02, 2012

How to resolve LVM error in powerHA

In the event you run into the following error:

    cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group. 

This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:

    # varyoffvg vg 

And then retry the LVM command.

BUT if it continues to be a problem, then stop powerHA 7.1 on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.