Configuring a failover cluster with heartbeat + pacemaker

Heartbeat is a daemon that provide services of clustering, this allows the exchange of messages between the machines running Heartbeat and check the health of them. On this post I’ll show the configuration of a simple cluster with failover and sharing a virtual ip. If any node fails the other node assume the role of the fail node and the service will keeps running. For this example I used apache to obtain an active-passive service. Heartbeat is used for checking if all the nodes are running, is recommended to use a dedicated interface for it. Pacemaker is a resource manager that provides a full management of the resources provided by the cluster. Basically with pacemaker we can handle two types of resources (LSB resource and OCF), LSB resources are provided by Linux distribution and it can be found in /etc/init.d/ and OCF provides things such setting a virtual ip address, monitor the health of a resource, start/stop a resource…

The configuration for our scenario is the next:

penguin-ha1:
eth1 –> VIP (192.168.1.10)
eth2 –> 192.168.1.11 (Heartbeat)

penguin-ha2:
eth1 –> Backup
eth2 –> 192.168.1.12 (Heartbeat)

Configuring heartbeat

1.- Edit /etc/hosts:

192.168.1.11 penguin-ha1
192.168.1.12 penguin-ha2

and /etc/hostname:

penguin-ha?

2.- Install heartbeat and pacemaker:

# aptitude install heartbeat pacemaker

3.- Edit /etc/ha.d/ha.cf :

# File to write debug messages to
debugfile /var/log/ha-debug
# File to write other messages to
logfile /var/log/ha-log
# Facility to use for syslog()/logger
logfacility local0
# keepalive: how long between heartbeats?
# time units in seconds or miliseconds (ms)
keepalive 2
# deadtime: how long-to-declare-host-dead?
deadtime 30
# warntime: how long before issuing "late heartbeat" warning?
warntime 10
#
initdead 120
# What UDP port to use for bcast/ucast communication?
udpport 694
# Ip address of the other node (Change it in every node)
ucast eth1 192.168.1.xx
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node penguin-ha1 penguin-ha2
# Treats to ping 192.168.1.1
ping 192.168.1.1
# Enable pacemaker
crm respawn

4.- Edit /etc/ha.d/authkeys:

auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

5.- Set correct file permissions to autkeys:

# chmod 600 authkeys

6.- restart heartbeat:

# service heartbeat restart

Configuration of pacemaker

1.- Verify the status of cluster:

# crm status
============
Last updated: Wed Apr 4 17:49:36 2012
Stack: Heartbeat
Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
0 Resources configured.
============
Online: [ penguin-ha1 penguin-ha2 ]

2.- Disable stonith:

# crm configure property stonith-enabled=false

3.- Number of nodes for the quorum:

# crm configure property expected-quorum-votes="2"

4.- To have quorum, more than half of the total number of cluster nodes need to be online (number of nodes / 2)+1. Is not the case when a node failure occurs in a 2-node cluster.
If you want to allow the remaining node to provide all the cluster services, you need to set the no-quorum-policy to ignore:

# crm configure property no-quorum-policy=ignore

5.- To prevent failback of a resource:

# crm configure rsc_defaults resource-stickiness=100

6.- List of scripts for the class ocf:

# crm ra list ocf

7.- Information for a script:

# crm ra info ocf:IPaddr2

8.- Add an VIP to our cluster:

# crm configure primitive havip1 ocf:IPaddr2 params ip=192.168.1.10 cidr_netmask=32 nic=eth1 op monitor interval=30s

9.- Check status and see that the resource havip1 is started in the first node:

# crm status
============
Last updated: Wed Apr 4 19:14:23 2012
Stack: Heartbeat
Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ penguin-ha1 penguin-ha2 ]
havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha1
# ifconfig
eth1 Link encap:Ethernet HWaddr 08:00:27:e4:d5:27
inet addr:192.168.1.10 Bcast:192.168.1.10 Mask:255.255.255.255
inet6 addr: fe80::a00:27ff:fee4:d527/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:122 errors:0 dropped:0 overruns:0 frame:0
TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11359 (11.0 KiB) TX bytes:1536 (1.5 KiB)

Adding a daemon to our cluster

1.- See the script details:

#crm ra info ocf:anything

2.- Adding apache daemon to our cluster:

# crm configure primitive apacheha lsb::apache2 op monitor interval=15s

3.- Setting up the resource VIP and apache are in the same node:

# crm configure colocation apacheha-havip1 INFINITY: havip1 apacheha

4.- See the status of our cluster:

# crm status
============
Last updated: Wed Apr 4 19:32:45 2012
Stack: Heartbeat
Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ penguin-ha1 penguin-ha2 ]

havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha2
apacheha (lsb:apache2): Started penguin-ha2

Configuring the order of boot of the services

# crm configure order ip-apache mandatory: havip1 apacheha

Migrate a resource to another node

# crm resource migrate havip1 penguin-ha1
# crm status
============
Last updated: Wed Apr 4 19:49:28 2012
Stack: Heartbeat
Current DC: penguin-ha1 (2629e6cc-fd9e-48d1-b28b-ed4f49a538e4) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ penguin-ha1 penguin-ha2 ]

havip1 (ocf::heartbeat:IPaddr2): Started penguin-ha1
apacheha (lsb:apache2): Started penguin-ha1

See our cluster configuration:

# crm configure show
node $id="2629e6cc-fd9e-48d1-b28b-ed4f49a538e4" penguin-ha1
node $id="31062a6e-231d-405a-8490-113044c5ca06" penguin-ha2
primitive apacheha lsb:apache2 
op monitor interval="15s"
primitive havip1 ocf:heartbeat:IPaddr2 
params ip="192.168.1.10" cidr_netmask="32" nic="eth1" 
op monitor interval="30s"
location cli-prefer-havip1 havip1 
rule $id="cli-prefer-rule-havip1" inf: #uname eq penguin-ha1
colocation apacheha-havip1 inf: havip1 apacheha
order ip-apache inf: havip1 apacheha
property $id="cib-bootstrap-options" 
dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" 
cluster-infrastructure="Heartbeat" 
stonith-enabled="false" 
expected-quorum-votes="2" 
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" 
resource-stickiness="100"

Test to shutdown penguin-ha1

For more information:
http://www.clusterlabs.org/
http://www.linux-ha.org/wiki/Main_Page

Configuring a failover cluster with heartbeat + pacemaker

3 thoughts on “Configuring a failover cluster with heartbeat + pacemaker”

reusable
July 13, 2012 at 02:09

It works as Active/Passive perfectly!
I am wondering, how to configure the Apache as Active/Active while the VIP maintain as Active/Passive?
- ivanmp91
  July 13, 2012 at 06:13
  
  Yes of course, you can read my other post:
  http://opentodo.wordpress.com/2012/04/29/load-balancing-with-ipvs-keepalived/
  
  You can configure load balancing web petitions for the two servers and configure two director servers with the vip as active/passive
scifferous
June 13, 2013 at 16:07

I think that colocation already handles the order of the resources, so no need to add an order constraint.

root@opentodo#

Configuring a failover cluster with heartbeat + pacemaker

Related

3 thoughts on “Configuring a failover cluster with heartbeat + pacemaker”

Leave a Reply Cancel reply

Follow root@opentodo#