Introduction
This post is the continuation of the series of posts for setup a High available NFS server, check the first post to setup the iSCSI storage part: http://opentodo.net/2015/06/high-available-nfs-server-setup-iscsi-multipath/
On this post I’ll explain how to setup the NFS cluster and the failover between two servers setup on the first post, using Corosync as the cluster engine and Pacemaker as the resource manager of the cluster.
Corosync
Corosync is an open source cluster engine which allows to share messages between the different servers of the cluster to check the health status and inform the other components of the cluster in case one of the servers goes down and starts the failover process.
Pacemaker
Pacemaker is an open source high availability resource manager. The task of Pacemaker is to keep the configuration of all the resources of the cluster and the relations between the servers and resources. For example if we need to setup a VIP (virtual IP), mount a filesystem or start a service on the active node of the cluster, pacemaker will setup all the resources assigned to the server in the order we specify on the configuration to ensure all the services will be started correctly.
Resource Agents
They’re just Scripts that manages different services. That scripts are based on the OCF standard: http://opencf.org/home.html The system comes already with some scripts, where most of the time will be enough for typical cluster setups, but of course that’s possible to develop a new one depending on your needs and requirements.
So after this small introduction about the cluster components, let’s get started with the configuration:
Corosync configuration
– Install package dependencies:
# aptitude install corosync pacemaker
– Generate a private key to ensure the authenticity and privacy of the messages sent between the nodes of the cluster:
# corosync-keygen –l
NOTE: This command will generate the private key on the path: /etc/corosync/authkey copy the key file to the other server.
– Edit /etc/corosync/corosync.conf:
# Please read the openais.conf.5 manual page totem { version: 2 # How long before declaring a token lost (ms) token: 3000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 10 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 3600 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Enable encryption secauth: on # How many threads to use for encryption/decryption threads: 0 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: active interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.55.71.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } nodelist { node { ring0_addr: nfs1-srv nodeid: 1 } node { ring0_addr: nfs2-srv nodeid: 2 } } amf { mode: disabled } quorum { # Quorum for the Pacemaker Cluster Resource Manager provider: corosync_votequorum expected_votes: 1 } service { # Load the Pacemaker Cluster Resource Manager ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } }
Pacemaker configuration
– Disable the quorum policy, since we need to deploy a 2-node configuration:
# crm configure property no-quorum-policy=ignore
– Setup the VIP resource of the cluster:
# crm configure primitive p_ip_nfs ocf:heartbeat:IPaddr2 params ip="10.55.71.21" cidr_netmask="24" nic="eth0" op monitor interval="30s"
– Setup the init script for the NFS server:
# crm configure primitive p_lsb_nfsserver lsb:nfs-kernel-server op monitor interval="30s"
NOTE: The nfs-kernel-server init script will be managed by the cluster, so disable the service to start it at boot time using update-rc.d utility:
# update-rc.d -f nfs-kernel-server remove
– Configure the mount point for the NFS export:
# crm configure primitive p_fs_nfs ocf:heartbeat:Filesystem params device="/dev/mapper/nfs1" directory="/mnt/nfs" fstype="ext3" op start interval="0" timeout="120" op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" op stop interval="0" timeout="240"
– Configure a resource group with the nfs service, the mountpoint and the VIP:
# crm configure group g_nfs p_fs_nfs p_lsb_nfsserver p_ip_nfs meta target-role="Started"
– Prevent healthy resources from being moved around the cluster configuring a resource stickiness:
# crm configure rsc_defaults resource-stickiness=200
Check cluster status
– Check the status of the resources of the cluster:
# crm status Last updated: Wed Jun 3 21:44:29 2015 Last change: Wed Jun 3 16:56:15 2015 via crm_resource on nfs1-srv Stack: corosync Current DC: nfs1-srv (1) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 3 Resources configured Online: [ nfs1-srv nfs2-srv ] Resource Group: g_nfs p_lsb_nfsserver (lsb:nfs-kernel-server): Started nfs2-srv p_ip_nfs (ocf::heartbeat:IPaddr2): Started nfs2-srv p_fs_nfs (ocf::heartbeat:Filesystem): Started nfs2-srv
Cluster failover
– If resources are in nfs2-srv and we want to failover to nfs1-srv:
# crm resource move g_nfs nfs1-srv
– Remove all constraints created by the move command:
# crm resource unmove g_nfs
Resulting configuration
# crm configure show node $id="1" nfs1-srv node $id="2" nfs2-srv primitive p_fs_nfs ocf:heartbeat:Filesystem \ params device="/dev/mapper/nfs-part1" directory="/mnt/nfs" fstype="ext3" options="_netdev" \ op start interval="0" timeout="120" \ op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \ op stop interval="0" timeout="240" primitive p_ip_nfs ocf:heartbeat:IPaddr2 \ params ip="10.55.71.21" cidr_netmask="24" nic="eth0" \ op monitor interval="30s" primitive p_lsb_nfsserver lsb:nfs-kernel-server \ op monitor interval="30s" group g_nfs p_lsb_nfsserver p_ip_nfs \ meta target-role="Started" colocation c_nfs_on_fs inf: p_lsb_nfsserver p_fs_nfs order o_volume_before_nfs inf: p_fs_nfs g_nfs:start property $id="cib-bootstrap-options" \ dc-version="1.1.10-42f2063" \ cluster-infrastructure="corosync" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="200"
References
https://wiki.ubuntu.com/ClusterStack/Natty
http://clusterlabs.org/quickstart-ubuntu.html
http://clusterlabs.org/doc/
This post is a second part of the series of post High available NFS server, find the first part here.
Pingback:High available NFS server: Setup iSCSI & multipath | root@opentodo#