High available NFS server: Setup Corosync & Pacemaker

Introduction

This post is the continuation of the series of posts for setup a High available NFS server, check the first post to setup the iSCSI storage part: http://opentodo.net/2015/06/high-available-nfs-server-setup-iscsi-multipath/
On this post I’ll explain how to setup the NFS cluster and the failover between two servers setup on the first post, using Corosync as the cluster engine and Pacemaker as the resource manager of the cluster.

Corosync

Corosync is an open source cluster engine which allows to share messages between the different servers of the cluster to check the health status and inform the other components of the cluster in case one of the servers goes down and starts the failover process.

Pacemaker

Pacemaker is an open source high availability resource manager. The task of Pacemaker is to keep the configuration of all the resources of the cluster and the relations between the servers and resources. For example if we need to setup a VIP (virtual IP), mount a filesystem or start a service on the active node of the cluster, pacemaker will setup all the resources assigned to the server in the order we specify on the configuration to ensure all the services will be started correctly.

Resource Agents

They’re just Scripts that manages different services. That scripts are based on the OCF standard: http://opencf.org/home.html The system comes already with some scripts, where most of the time will be enough for typical cluster setups, but of course that’s possible to develop a new one depending on your needs and requirements.

So after this small introduction about the cluster components, let’s get started with the configuration:

Corosync configuration

– Install package dependencies:


# aptitude install corosync pacemaker

– Generate a private key to ensure the authenticity and privacy of the messages sent between the nodes of the cluster:


# corosync-keygen –l

NOTE: This command will generate the private key on the path: /etc/corosync/authkey copy the key file to the other server.

– Edit /etc/corosync/corosync.conf:


# Please read the openais.conf.5 manual page

totem {
version: 2

# How long before declaring a token lost (ms)
token: 3000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join: 60

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600

# Turn off the virtual synchrony filter
vsftype: none

# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Enable encryption
secauth: on

# How many threads to use for encryption/decryption
threads: 0

# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: active

interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.55.71.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

nodelist {
node {
ring0_addr: nfs1-srv
nodeid: 1
}
node {
ring0_addr: nfs2-srv
nodeid: 2
}
}

amf {
mode: disabled
}

quorum {
# Quorum for the Pacemaker Cluster Resource Manager
provider: corosync_votequorum
expected_votes: 1
}

service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}

aisexec {
user: root
group: root
}

logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}

Pacemaker configuration

– Disable the quorum policy, since we need to deploy a 2-node configuration:

# crm configure property no-quorum-policy=ignore

– Setup the VIP resource of the cluster:


# crm configure primitive p_ip_nfs ocf:heartbeat:IPaddr2 params ip="10.55.71.21" cidr_netmask="24" nic="eth0" op monitor interval="30s"

– Setup the init script for the NFS server:

# crm configure primitive p_lsb_nfsserver lsb:nfs-kernel-server op monitor interval="30s"

NOTE: The nfs-kernel-server init script will be managed by the cluster, so disable the service to start it at boot time using update-rc.d utility:

# update-rc.d -f nfs-kernel-server remove

– Configure the mount point for the NFS export:

# crm configure primitive p_fs_nfs ocf:heartbeat:Filesystem params device="/dev/mapper/nfs1" directory="/mnt/nfs" fstype="ext3" op start interval="0" timeout="120" op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" op stop interval="0" timeout="240"

– Configure a resource group with the nfs service, the mountpoint and the VIP:

# crm configure group g_nfs p_fs_nfs p_lsb_nfsserver p_ip_nfs meta target-role="Started"

– Prevent healthy resources from being moved around the cluster configuring a resource stickiness:

# crm configure rsc_defaults resource-stickiness=200

Check cluster status

– Check the status of the resources of the cluster:


# crm status
Last updated: Wed Jun 3 21:44:29 2015
Last change: Wed Jun 3 16:56:15 2015 via crm_resource on nfs1-srv
Stack: corosync
Current DC: nfs1-srv (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured

Online: [ nfs1-srv nfs2-srv ]

Resource Group: g_nfs
p_lsb_nfsserver (lsb:nfs-kernel-server): Started nfs2-srv
p_ip_nfs (ocf::heartbeat:IPaddr2): Started nfs2-srv
p_fs_nfs (ocf::heartbeat:Filesystem): Started nfs2-srv

Cluster failover

– If resources are in nfs2-srv and we want to failover to nfs1-srv:

# crm resource move g_nfs nfs1-srv

– Remove all constraints created by the move command:

# crm resource unmove g_nfs

Resulting configuration


# crm configure show
node $id="1" nfs1-srv
node $id="2" nfs2-srv
primitive p_fs_nfs ocf:heartbeat:Filesystem \
params device="/dev/mapper/nfs-part1" directory="/mnt/nfs" fstype="ext3" options="_netdev" \
op start interval="0" timeout="120" \
op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
op stop interval="0" timeout="240"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
params ip="10.55.71.21" cidr_netmask="24" nic="eth0" \
op monitor interval="30s"
primitive p_lsb_nfsserver lsb:nfs-kernel-server \
op monitor interval="30s"
group g_nfs p_lsb_nfsserver p_ip_nfs \
meta target-role="Started"
colocation c_nfs_on_fs inf: p_lsb_nfsserver p_fs_nfs
order o_volume_before_nfs inf: p_fs_nfs g_nfs:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="200"

References

https://wiki.ubuntu.com/ClusterStack/Natty
http://clusterlabs.org/quickstart-ubuntu.html
http://clusterlabs.org/doc/

This post is a second part of the series of post High available NFS server, find the first part here.

High available NFS server: Setup Corosync & Pacemaker

root@opentodo#

High available NFS server: Setup Corosync & Pacemaker

Related

One thought on “High available NFS server: Setup Corosync & Pacemaker”

Leave a Reply Cancel reply

Follow root@opentodo#