Setup Redis Failover with Redis Sentinel

Recently I’ve been playing with redis, to study as an alternative for memcached for one project. One of my main requirements was to implement a good failover solution for a key-value memory database, actually you’ve many ways to do it from code and memcached (doing get/set checking the availability of the servers) or better solution use the repcached patch for memcached, the first it’s not a clean solution at all and I was not very convinced with repcached. After get more involved in all the features that redis can offer, one of the interesting features is the persistence on disk. Redis stores all in memory key/value pairs on disk, so in case of failure after recovery you’ll get the last data stored from the last snapshot in memory. Note that the store to disk is not an operation effected on the fly, so you can lose some data, although redis offers different kind of setup to store on disk it’s important that you understand how it works. Anyway remember you’re working with a memory key-value cache solution, so it’s not the solution to make persistent critical data. A good read that I recommend to understand how persistence works:
http://oldblog.antirez.com/post/redis-persistence-demystified.html

One of the other interesting features that I really appreciate of this solution, is the possibility to work with different data structures like lists, hashes, sets and sorted sets. So you’ve more flexibility to work to store different values on cache on the same key and have a support for native data types provided from the client library of your programming language used. You can take a look there to check the different data structures used in redis:

http://redis.io/topics/data-types

After this small introduction of some tips that why I choose redis, now I’ll talk about the failover solution. Redis supports master-slave asynchronous replication and sentinel will provides the failover, which comes from redis version 2.6, but from the project documentation they recommend the version shipped with redis 2.8, seems like they did very important enhances with the last version. Sentinel is a distributed system, which the different processes are communicated between them over messages and using the different protocols to elect a new master and inform the address of the current master of the cluster to the client.

We’ll run sentinel in our systems as a separate daemon listening in a different port, which will communicate with the other sentinels setup on the cluster to alert in event of a node failure and choose a new master. Sentinel will change our configuration files of our servers just to attach a recovered node on the cluster (setup as slave) or promote a slave as a master. The basic process to choose a new master basically is this:

1.- One sentinel node detects a server failure on the cluster in the number of milliseconds setup in the directive “down-after-milliseconds“. At this moment this sentinel node mark this instance as subjectively down (SDOWN).

2.- When the enough number of sentinels agreed is reached about this master failure , then this node is marked as objectively down (ODOWN), then the failover trigger is processed. The number of sentinels it’s setup for master.

3.- After the trigger failover, it’s not still enough to perform a failover, since it’s subject to a quorum process and at least a majority of Sentinels must authorized the Sentinel to failover.

Basically we’ll need a minimum of three nodes in our cluster to setup our redis failover solution. In my case I choose to use two redis servers (master & slave) both running sentinel, and one third node running just sentinel for the quorum process. For more information about the failover process and sentinel you can check the official documentation:

http://redis.io/topics/sentinel

After this basic tips about how it works redis & sentinel, we can begin with the setup. For this environment I used a total of three servers running Ubuntu 14.04. All that I need to do is install redis-server from repositories. Note if you’re using other GNU/Linux distribution or an older Ubuntu version you’ll need to compile and install by hand.

– Setup for redis sentinels (nodes 1,2,3) /etc/redis/sentinel.conf:

# port <sentinel-port>
# The port that this sentinel instance will run on
port 26379
daemonize yes
pidfile /var/run/redis/redis-sentinel.pid
loglevel notice
logfile /var/log/redis/redis-sentinel.log

# Master setup
# sentinel parallel-syncs <master-name> <numslaves>
# Minimum of two sentinels to declare an ODOWN
sentinel monitor mymaster 172.16.23.33 6379 2
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 5000
# sentinel failover-timeout <master-name> <milliseconds>
sentinel failover-timeout mymaster 900000
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1

# Slave setup
sentinel monitor resque 172.16.23.34 6379 2
sentinel down-after-milliseconds resque 5000
sentinel failover-timeout resque 900000
sentinel parallel-syncs resque 4

– Create init scripts for sentinels (nodes 1,2,3) /etc/init.d/redis-sentinel:

#! /bin/sh
### BEGIN INIT INFO
# Provides: redis-sentinel
# Required-Start: $syslog $remote_fs
# Required-Stop: $syslog $remote_fs
# Should-Start: $local_fs
# Should-Stop: $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: redis-sentinel - Persistent key-value db
# Description: redis-sentinel - Persistent key-value db
### END INIT INFO
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/bin/redis-sentinel
DAEMON_ARGS=/etc/redis/sentinel.conf
NAME=redis-sentinel
DESC=redis-sentinel

RUNDIR=/var/run/redis
PIDFILE=$RUNDIR/redis-sentinel.pid

test -x $DAEMON || exit 0

if [ -r /etc/default/$NAME ]
then
. /etc/default/$NAME
fi

. /lib/lsb/init-functions

set -e

case "$1" in
start)
echo -n "Starting $DESC: "
mkdir -p $RUNDIR
touch $PIDFILE
chown redis:redis $RUNDIR $PIDFILE
chmod 755 $RUNDIR

if [ -n "$ULIMIT" ]
then
ulimit -n $ULIMIT
fi

if start-stop-daemon --start --quiet --umask 007 --pidfile $PIDFILE --chuid redis:redis --exec $DAEMON -- $DAEMON_ARGS
then
echo "$NAME."
else
echo "failed"
fi
;;
stop)
echo -n "Stopping $DESC: "
if start-stop-daemon --stop --retry forever/TERM/1 --quiet --oknodo --pidfile $PIDFILE --exec $DAEMON
then
echo "$NAME."
else
echo "failed"
fi
rm -f $PIDFILE
sleep 1
;;

restart|force-reload)
${0} stop
${0} start
;;

status)
echo -n "$DESC is "
if start-stop-daemon --stop --quiet --signal 0 --name ${NAME} --pidfile ${PIDFILE}
then
echo "running"
else
echo "not running"
exit 1
fi
;;

*)
echo "Usage: /etc/init.d/$NAME {start|stop|restart|force-reload|status}" >&2
exit 1
;;
esac

exit 0

– Give execution permission on the script:

# chmod +x /etc/init.d/redis-sentinel

– Start the script automatically at boot time:

# update-rc.d redis-sentinel defaults

– Change owner & group for /etc/redis/ to allow sentinel change the configuration files:

# chown -R redis.redis /etc/redis/

– On node 3 I’ll not use redis-server, so I can remove the init script:

# update-rc.d redis-server remove

– Edit the configuration of redis server on nodes 1,2 (/etc/redis/redis.conf), with the proper setup for your project. The unique requirement to work with seninel it’s just to setup the proper ip address on bind directive. All the directives are commented on the file and are very clear, so take your time to adapt redis to your project.

– Connecting to our redis cluster:

Now we’ve our redis cluster ready to store our data. In my case I work with Perl and currently I’m using this library: http://search.cpan.org/dist/Redis/lib/Redis.pm which you can install using the cpan tool. Note the version coming from ubuntu repositories (libredis-perl) it’s quite old and doesn’t implement the sentinel interface, so it’s better to install the module from cpan.

So to connect to our cluster as documented on the client library I used the next chain:

my $cache = Redis->new(sentinels => [ "redis1:26379", "redis2:26379", "node3:26379" ],
service => 'mymaster' );

So basically the library will tries to connect with the different sentinel servers and get the address of the current master redis servers which will get and store the data in our system.

Another solution instead to connect from our scripts to the different sentinel servers, is use haproxy as backend and as a single server connection for our clients. HaProxy should check on the different redis servers the string “role:master” and redirect all the requests to this server.

Take a look on the library documentation for your programming language used in your project. The different clients currently supported by redis are listed here:

http://redis.io/clients

– Sources:
http://redis.io/documentation

Setup Redis Failover with Redis Sentinel

Tagged on: cache key-value perl redis

Faisal

January 22, 2015 at 07:56

Permalink

Hi Guys,

I am wondering if you may be able to give me some pointers or point me to the right place.

For a project at a client, we have setup redis 2.8 (on solaris, compiled from source) in HA. The setup ran perfectly fine for a few days before going crazy suddenly. Crazy = The sentinels failed over the master every few minutes (sometimes even multiple failovers within the same hour) continuously for a few days (for about 10 days). This is after the same setup had been running fine (occasional SDOWN, but no failover) for more than week.We moved the test users to another setup and this problematic setup became normal after a few days.

In the master redis log, I can see that the slaves were disconnected at the time of sentinels failing over the master. However, what is strange to me is that, the master was able to come back up within 1 second of the failover being initiated. (down-after-milliseconds was not set, so the default 30 seconds should apply for an SDOWN condition). Have you guys seen any strange behaviour from redis/sentinel of +sdown before 30 seconds?

The problem that I have is that we are not able to replicate the issue, but the incident above (which happened a while back while I was on leave) has left us with very little confidence in the HA of redis. I have logs for redis and sentinels from the setup (but not for the network and the servers) and was wondering if I should be looking for specific problems with out configuration and information in the logs.

The setup is 3 instances of redis on 3 different servers (on the same LAN), with one sentinel running on each of the 3 servers.

Appreciate any ideas.
Faisal

8 thoughts on “Setup Redis Failover with Redis Sentinel”

Itamar Haber
May 26, 2014 at 11:41

This article is a great introduction on how to make Redis highly available using Sentry. Do remember to test your setup extensively and understand the capabilities/limitations of the solution – HA doesn’t come easily.
Tom
July 2, 2014 at 13:44

Thanks for great article, definitively helps to get going with sentinel.

Two things:
1) update-rc.d sentinel defaults should be update-rc.d redis-sentinel defaults
2) echo “Usage: /etc/init.d/$NAME {start|stop|restart|force-reload|status}” >&2 – html entities needs to be replaced with correct characters
- Mark
  August 25, 2014 at 22:03
  
  Good catch Tom
- ivanmp91Post author
  September 3, 2014 at 20:56
  
  Fixed! Thanks a mil Tom! 😀
Madan
September 30, 2014 at 20:02

Great article, however, I did have one question:

If I have a pretty simple setup, say one master, and one slave both running redis (with slaveof configured to point to the master), should I have two sentinel processes running? (One on slave, and one on master), or is there only one on master needed?

Thanks!
Sy
December 16, 2014 at 23:17

Another thing to note is s/NAME=redis-sentinel/NAME=redis-server otherwise /etc/init.d/redis-sentinel status doesn’t work.
Faisal
January 22, 2015 at 07:56

Hi Guys,

I am wondering if you may be able to give me some pointers or point me to the right place.

For a project at a client, we have setup redis 2.8 (on solaris, compiled from source) in HA. The setup ran perfectly fine for a few days before going crazy suddenly. Crazy = The sentinels failed over the master every few minutes (sometimes even multiple failovers within the same hour) continuously for a few days (for about 10 days). This is after the same setup had been running fine (occasional SDOWN, but no failover) for more than week.We moved the test users to another setup and this problematic setup became normal after a few days.

In the master redis log, I can see that the slaves were disconnected at the time of sentinels failing over the master. However, what is strange to me is that, the master was able to come back up within 1 second of the failover being initiated. (down-after-milliseconds was not set, so the default 30 seconds should apply for an SDOWN condition). Have you guys seen any strange behaviour from redis/sentinel of +sdown before 30 seconds?

The problem that I have is that we are not able to replicate the issue, but the incident above (which happened a while back while I was on leave) has left us with very little confidence in the HA of redis. I have logs for redis and sentinels from the setup (but not for the network and the servers) and was wondering if I should be looking for specific problems with out configuration and information in the logs.

The setup is 3 instances of redis on 3 different servers (on the same LAN), with one sentinel running on each of the 3 servers.

Appreciate any ideas.
Faisal
Pingback:My Great Guardian – Watching Redis With Sentinel | A posteriori

root@opentodo#

Setup Redis Failover with Redis Sentinel

Related

8 thoughts on “Setup Redis Failover with Redis Sentinel”

Leave a Reply Cancel reply

Follow root@opentodo#