Thursday, August 14, 2008

Link based IPMP vs probe based IPMP in solaris 10

From the sunsolve doc http://sunsolve.sun.com/search/document.do?assetkey=1-9-86869-1 , there are 3 modes of IPMP that we can configure,

Probe based IPMP - active standby setup
Probe based IPMP - active active setup
Linke based IPMP - active standby setup

The only difference on Probe based IPMP active-active' setup versus active-standby's setup is the word "deprecated" in the second configuration file. When you add in the "deprecated" tag, the network traffic would actually NOT go through the physical IP. When you snoop on the interface the traffic will go out on your virtual IPs.

Link-based IPMP

For link-based failure detection, only the link between local interface and the link partner is checked on hardware layer. Neither IP layer nor any further network path will be monitored.

No test addresses are required for link-based failure detection. So the pro here is that you save on the number of IP. But then if you are on your own private network, are you sure you have some many IPs that you would ran out of it? Most likely the reason is the ease of IP management.

Probe-based IPMP

Probe-based failure detection is performed on each interface in the IPMP group that has a test address. Using this test address, ICMP probe messages go out over this interface to one or more target systems on the same IP link.

The in.mpathd daemon determines which target systems to probe dynamically. The whole network path up to the gateway (router) is monitored on IP layer. With all interfaces in the IPMP group connected via redundant network paths (switches etc.), you get full redundancy.

On the other hand the default router can be a single point of failure, resulting in 'All Interfaces in group have failed'.

Conclusion

Meaning that probe based IPMP monitors the path up to the gateway while link based IPMP monitors only up to the next physical link. Nothing more nothing less.

Link based IPMP cant 'see' what's after this physical link.

I still prefer probe based IPMP as i have more ease when troubleshooting to determine whether i have connection all the way to the destination. Using link based IPMP means that i would have to get the network guys to check for me if the connection is down.

Note: netstat -k seem to be dropped in solaris 10.

Link based IPMP on Solaris 10

Setting up Link based IPMP in solaris 10 is much more easier than probe based IPMP.

Lets see what NIC i have in my server..


root ~>#dladm show-dev
bge0 link: up speed: 1000 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: unknown speed: 0 Mbps duplex: unknown
bge3 link: up speed: 100 Mbps duplex: full



So i have 3 NIC connected, lets use bge0 and bge1 for our link based IPMP. Just use the following configuration.


root ~># more /etc/hostname.bge*
::::::::::::::
/etc/hostname.bge0
::::::::::::::
myserver netmask + broadcast + group production up
::::::::::::::
/etc/hostname.bge1
::::::::::::::
group production up



Remember to put the IPs in the /etc/hosts, netmask in /etc/defaultrouter.

Verify that IPMP daemon is running.


root ~>#pgrep -lf mpathd
165 /usr/lib/inet/in.mpathd -a



Another indication that you are using link based IPMP instead of probe based IPMP is the following message appearing in your console or /var/adm/messages.



Aug 14 10:41:44 in.mpathd[155]: No test address configured on interface bge1; disabling probe-based failure detection on it
Aug 14 10:41:44 in.mpathd[155]: No test address configured on interface bge0; disabling probe-based failure
detection on it


Now, we are ready to do some fail over test.


# if_mpadm -d bge0

root@png2gw2:~>#ifconfig -a
bge0: flags=89000842 mtu 0 index 2
inet 0.0.0.0 netmask 0
groupname production
ether 0:14:4f:91:d:5c
bge1: flags=1000843 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname production
ether 0:14:4f:91:d:5d
bge1:1: flags=1000843 mtu 1500 index 3
inet 10.55.9.192 netmask ffffff00 broadcast 10.55.9.255


Noticed that the IP in bge0 has been transfered into bge1:1. You may also notice the following will appear in your console or /var/adm/messages.


Aug 14 10:05:10 myserver in.mpathd[165]: [ID 832587 daemon.error] Successfully failed over from NIC bge0 to NIC bge1



So we are quite done. Lets recover and restore the IP



root ~>#if_mpadm -r bge0


In /var/adm/message,


Aug 14 10:05:10 myserver in.mpathd[165]: [ID 832587 daemon.error] Successfully failed over from NIC bge0 to NIC bge1
Aug 14 10:07:26 myserver in.mpathd[165]: [ID 620804 daemon.error] Successfully failed back to NIC bge0


We have restored the NIC.


root ~>#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=1000843 mtu 1500 index 2
inet 10.55.9.192 netmask ffffff00 broadcast 10.55.9.255
groupname production
ether 0:14:4f:91:d:5c
bge1: flags=1000843 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname production
ether 0:14:4f:91:d:5d



While doing this test, i also noticed that when the failover was done, bge1 is still 0.0.0.0 and bge0's IP was plumbed to bge1:1.

From some research and experiment, i found out bge1 can be acutally configured with its own IP, such that it can still provide service, thus allowing 1 more NIC to work.

When the failover happens, both IP will serve business as usual.Here's the test.

bge0 and bge1 has IP plumbed on them. let do a Telnet to 10.55.9.192 (bge0)


myclient -> myserver TCP D=22 S=57645 Syn Seq=2664010663 Len=0 Win=49640 Options=


Also do a Telnet to 10.55.9.198 (bge1)


myclient -> 10.55.9.198 TCP D=22 S=57656 Syn Seq=2669824191 Len=0 Win=49640 Options=
10.55.9.198 -> myclient TCP D=57656 S=22 Syn Ack=2669824192 Seq=478613400 Len=0 Win=49640 Options=

myclient -> 10.55.9.198 TCP D=22 S=57656 Ack=478613401 Seq=2669824192 Len=0 Win=49640
10.55.9.198 -> myclient TCP D=57656 S=22 Push Ack=2669824192 Seq=478613401 Len=20 Win=49640

myclient -> 10.55.9.198 TCP D=22 S=57656 Ack=478613421 Seq=2669824192 Len=0 Win=49640


Because I did not log in, /var/adm/message 'complain'


Aug 14 10:16:27 myserver sshd[24576]: [ID 800047 auth.info] Did not receive identification string from 10.10.140.36
Aug 14 10:16:40 myserver sshd[24579]: [ID 800047 auth.info] Did not receive identification string from 10.10.140.36


Lets fail Bge0 now, monitor the /var/adm/message and snoop output. Acutally all traffic is now going on bge1:1.


Aug 14 10:19:38 myserverin.mpathd[165]: [ID 832587 daemon.error] Successfully failed over from NIC bge0 to NIC bge1


myclient -> myserver TCP D=22 S=57682 Syn Seq=2724219601 Len=0 Win=49640 Options=
myserver -> myclient TCP D=57682 S=22 Syn Ack=2724219602 Seq=2859863037 Len=0 Win=49640 Options=


Traffic on 10.55.9.198 (bge1) is unaffected.


myclient -> 10.55.9.198 TCP D=22 S=57685 Syn Seq=2740048358 Len=0 Win=49640 Options=
10.55.9.198 -> myclient TCP D=57685 S=22 Syn Ack=2740048359 Seq=1234692052 Len=0 Win=49640 Options=

myclient -> 10.55.9.198 TCP D=22 S=57685 Ack=1234692053 Seq=2740048359 Len=0 Win=49640
10.55.9.198 -> myserver TCP D=57685 S=22 Push Ack=2740048359 Seq=1234692053 Len=20 Win=49640
myclient -> 10.55.9.198 TCP D=22 S=57685 Ack=1234692073 Seq=2740048359 Len=0 Win=49640


We restore the NIC.


Aug 14 10:35:08 myserver in.mpathd[165]: [ID 620804 daemon.error] Successfully failed back to NIC bge0



Some of my references are:
http://sunsolve.sun.com/search/document.do?assetkey=1-61-211105-1
http://raulsg.wikispaces.com/ipmp-link-based
http://os.miamano.eu/node/25