An Oracle DBA Blog: Oracle RAC: Using a Second NIC for Interconnect HA

Introduction

One of the most important feature of Oracle RAC is High Availability, more resilients components you have better is your Clusterware HA Score. For Oracle Clusterware the Interconnect Network Plays a big rule in your enviroment, let's supose you lost connection between your private network Oracle will choose some nodes to be evicted. For that reason we should look a little foward on this.

To ensure resilient over network we use Link Aggregation, this could be implement over a variety of hardware components such NICs and Network Switchs. We can use some OS Techinics to implement link aggregation like bounding, and the system administrator is the responsable to garantee resilient over this network. There is no problem on this approach, actually a large number of environment use it, but, Oracle has it's approach too.

Oracle High Availability IP

From Oracle 11.2.0.2 we can use HAIP (High Availability IP) instead the OS method. For this we have to configure a second network card over a different subnet from the first interconnect, that is for garantee HA, keep in mind that you should use the same MTU and have the network interface name over all nodes.

You can have up to four active device for HAIP but you can configure more, but Oracle will use only four, in case you lost one Oracle will choose another configured device to replace the lost one. Still in case of a failure of a single device you will not soffer for bounces or disconnects Oracle will available over all nodes.

Checkin the current environment:

We can use oifcfg getif to verify all network used by Oracle Clusterware, the main ideia is to add the 192.168.2.0 subnet as a second network in a link aggregation.

[oracle@srv-ora-rac01 ~]$ oifcfg getif

enp0s3 192.168.1.0 global public

enp0s8 192.168.0.0 global cluster_interconnect

[oracle@srv-ora-rac01 ~]$

Check if the IP is reachable in both nodes

[root@srv-ora-rac01 ~]# ping srv-ora-rac01-priv2
PING srv-ora-rac01-priv2 (192.168.2.74) 56(84) bytes of data.
64 bytes from srv-ora-rac01-priv2 (192.168.2.74): icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from srv-ora-rac01-priv2 (192.168.2.74): icmp_seq=2 ttl=64 time=0.039 ms
^C
--- srv-ora-rac01-priv2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1053ms
rtt min/avg/max/mdev = 0.039/0.044/0.049/0.005 ms

[root@srv-ora-rac01 ~]# ping srv-ora-rac02-priv2
PING srv-ora-rac02-priv2 (192.168.2.75) 56(84) bytes of data.
64 bytes from srv-ora-rac02-priv2 (192.168.2.75): icmp_seq=1 ttl=64 time=1.98 ms
64 bytes from srv-ora-rac02-priv2 (192.168.2.75): icmp_seq=2 ttl=64 time=0.253 ms
^C
--- srv-ora-rac02-priv2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.253/1.119/1.985/0.866 ms
[root@srv-ora-rac01 ~]#

Configuring HAIP

Ok now we can configure a second network at our interconnect, of this we only have to

[root@srv-ora-rac01 ~]# oifcfg setif -global enp0s9/192.168.2.0:cluster_interconnect
[root@srv-ora-rac01 ~]#

Now we can check the Clusterware Network and validate if we have now thwo interconnect networks:

[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3 192.168.1.0 global public
enp0s8 192.168.0.0 global cluster_interconnect
enp0s9 192.168.2.0 global cluster_interconnect
[root@srv-ora-rac01 ~]#

From this we already have Interconnect capable to failover but we need to restart CRS in all nodes to make full use of HAIP.

POC - Fail of one Network interface

With HAIP configured we can deal with the failure of interconnect communication, I'll simulate this bringing down one interface and the clusterware need to stay up and running:

[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3 192.168.1.0 global public
enp0s8 192.168.0.0 global cluster_interconnect
enp0s9 192.168.2.0 global cluster_interconnect

[root@srv-ora-rac01 ~]# crsctl check cluster -all
**************************************************************
srv-ora-rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv-ora-rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@srv-ora-rac01 ~]#

[root@srv-ora-rac01 ~]# ifconfig -a enp0s9

enp0s9: flags=4163 mtu 1500

inet 192.168.2.74 netmask 255.255.255.0 broadcast 192.168.2.255

inet6 fe80::117f:3ef7:eb06:9c26 prefixlen 64 scopeid 0x20

ether 08:00:27:e6:c9:32 txqueuelen 1000 (Ethernet)

RX packets 4120 bytes 2929493 (2.7 MiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 4645 bytes 3708582 (3.5 MiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@srv-ora-rac01 ~]#

[root@srv-ora-rac01 ~]# ifdown enp0s9

Device 'enp0s9' successfully disconnected.

[root@srv-ora-rac01 ~]# ifconfig -a enp0s9

enp0s9: flags=4163 mtu 1500

ether 08:00:27:e6:c9:32 txqueuelen 1000 (Ethernet)

RX packets 5070 bytes 3622826 (3.4 MiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 5458 bytes 4208014 (4.0 MiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@srv-ora-rac01 ~]#

[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3 192.168.1.0 global public
enp0s8 192.168.0.0 global cluster_interconnect
enp0s9 192.168.2.0 global cluster_interconnect
[root@srv-ora-rac01 ~]# crsctl check cluster -all

**************************************************************
srv-ora-rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv-ora-rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Ok, nothing happen to node, all services remain available as expected.

Hope you enjoy!

Diogo

Pages

terça-feira, 27 de novembro de 2018

Oracle RAC: Using a Second NIC for Interconnect HA

Introduction

Oracle High Availability IP

Checkin the current environment:

Configuring HAIP

POC - Fail of one Network interface

Nenhum comentário: