Heartbeat is a daemon that provide services of clustering, this allows the exchange of messages between the machines running Heartbeat and check the health of them. On this post I’ll show the configuration of a simple cluster with failover and sharing a virtual ip. If any node fails the other node assume the role of the fail node and the service will keeps running. For this example I used apache to obtain an active-passive service. Heartbeat is used for checking if all the nodes are running, is recommended to use a dedicated interface for it. Pacemaker is a resource manager that provides a full management of the resources provided by the cluster. Basically with pacemaker we can handle two types of resources (LSB resource and OCF), LSB resources are provided by Linux distribution and it can be found in /etc/init.d/ and OCF provides things such setting a virtual ip address, monitor the health of a resource, start/stop a resource…
The configuration for our scenario is the next:
penguin-ha1:
eth1 –> VIP (192.168.1.10)
eth2 –> 192.168.1.11 (Heartbeat)
penguin-ha2:
eth1 –> Backup
eth2 –> 192.168.1.12 (Heartbeat)
Configuring heartbeat
1.- Edit /etc/hosts:
192.168.1.11 penguin-ha1 192.168.1.12 penguin-ha2
and /etc/hostname:
penguin-ha?
2.- Install heartbeat and pacemaker:
# aptitude install heartbeat pacemaker
3.- Edit /etc/ha.d/ha.cf :
# File to write debug messages to debugfile /var/log/ha-debug # File to write other messages to logfile /var/log/ha-log # Facility to use for syslog()/logger logfacility local0 # keepalive: how long between heartbeats? # time units in seconds or miliseconds (ms) keepalive 2 # deadtime: how long-to-declare-host-dead? deadtime 30 # warntime: how long before issuing "late heartbeat" warning? warntime 10 # initdead 120 # What UDP port to use for bcast/ucast communication? udpport 694 # Ip address of the other node (Change it in every node) ucast eth1 192.168.1.xx # Tell what machines are in the cluster # node nodename ... -- must match uname -n node penguin-ha1 penguin-ha2 # Treats to ping 192.168.1.1 ping 192.168.1.1 # Enable pacemaker crm respawn
4.- Edit /etc/ha.d/authkeys:
auth 1 1 crc #2 sha1 HI! #3 md5 Hello!
5.- Set correct file permissions to autkeys:
# chmod 600 authkeys
6.- restart heartbeat:
# service heartbeat restart
Configuration of pacemaker
1.- Verify the status of cluster:
# crm status ============ Last updated: Wed Apr 4 17:49:36 2012 Stack: Heartbeat Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, unknown expected votes 0 Resources configured. ============ Online: [ penguin-ha1 penguin-ha2 ]
2.- Disable stonith:
# crm configure property stonith-enabled=false
3.- Number of nodes for the quorum:
# crm configure property expected-quorum-votes="2"
4.- To have quorum, more than half of the total number of cluster nodes need to be online (number of nodes / 2)+1. Is not the case when a node failure occurs in a 2-node cluster.
If you want to allow the remaining node to provide all the cluster services, you need to set the no-quorum-policy to ignore:
# crm configure property no-quorum-policy=ignore
5.- To prevent failback of a resource:
# crm configure rsc_defaults resource-stickiness=100
6.- List of scripts for the class ocf:
# crm ra list ocf
7.- Information for a script:
# crm ra info ocf:IPaddr2
8.- Add an VIP to our cluster:
# crm configure primitive havip1 ocf:IPaddr2 params ip=192.168.1.10 cidr_netmask=32 nic=eth1 op monitor interval=30s
9.- Check status and see that the resource havip1 is started in the first node:
# crm status ============ Last updated: Wed Apr 4 19:14:23 2012 Stack: Heartbeat Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ penguin-ha1 penguin-ha2 ] havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha1 # ifconfig eth1 Link encap:Ethernet HWaddr 08:00:27:e4:d5:27 inet addr:192.168.1.10 Bcast:192.168.1.10 Mask:255.255.255.255 inet6 addr: fe80::a00:27ff:fee4:d527/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:122 errors:0 dropped:0 overruns:0 frame:0 TX packets:26 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:11359 (11.0 KiB) TX bytes:1536 (1.5 KiB)
Adding a daemon to our cluster
1.- See the script details:
#crm ra info ocf:anything
2.- Adding apache daemon to our cluster:
# crm configure primitive apacheha lsb::apache2 op monitor interval=15s
3.- Setting up the resource VIP and apache are in the same node:
# crm configure colocation apacheha-havip1 INFINITY: havip1 apacheha
4.- See the status of our cluster:
# crm status ============ Last updated: Wed Apr 4 19:32:45 2012 Stack: Heartbeat Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ penguin-ha1 penguin-ha2 ]
havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha2
apacheha (lsb:apache2): Started penguin-ha2
Configuring the order of boot of the services
# crm configure order ip-apache mandatory: havip1 apacheha
Migrate a resource to another node
# crm resource migrate havip1 penguin-ha1 # crm status ============ Last updated: Wed Apr 4 19:49:28 2012 Stack: Heartbeat Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ penguin-ha1 penguin-ha2 ]
havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha1
apacheha (lsb:apache2): Started penguin-ha1
See our cluster configuration:
# crm configure show node $id="2629e6cc-fd9e-48d1-b28b-ed4f49a538e4" penguin-ha1 node $id="31062a6e-231d-405a-8490-113044c5ca06" penguin-ha2 primitive apacheha lsb:apache2 op monitor interval="15s" primitive havip1 ocf:heartbeat:IPaddr2 params ip="192.168.1.10" cidr_netmask="32" nic="eth1" op monitor interval="30s" location cli-prefer-havip1 havip1 rule $id="cli-prefer-rule-havip1" inf: #uname eq penguin-ha1 colocation apacheha-havip1 inf: havip1 apacheha order ip-apache inf: havip1 apacheha property $id="cib-bootstrap-options" dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" cluster-infrastructure="Heartbeat" stonith-enabled="false" expected-quorum-votes="2" no-quorum-policy="ignore" rsc_defaults $id="rsc-options" resource-stickiness="100"
Test to shutdown penguin-ha1
For more information:
http://www.clusterlabs.org/
http://www.linux-ha.org/wiki/Main_Page
It works as Active/Passive perfectly!
I am wondering, how to configure the Apache as Active/Active while the VIP maintain as Active/Passive?
Yes of course, you can read my other post:
http://opentodo.wordpress.com/2012/04/29/load-balancing-with-ipvs-keepalived/
You can configure load balancing web petitions for the two servers and configure two director servers with the vip as active/passive
I think that colocation already handles the order of the resources, so no need to add an order constraint.