High available NFS server: Setup Corosync & Pacemaker

Introduction

This post is the continuation of the series of posts for setup a High available NFS server, check the first post to setup the iSCSI storage part: http://opentodo.net/2015/06/high-available-nfs-server-setup-iscsi-multipath/
On this post I’ll explain how to setup the NFS cluster and the failover between two servers setup on the first post, using Corosync as the cluster engine and Pacemaker as the resource manager of the cluster.

Corosync

Corosync is an open source cluster engine which allows to share messages between the different servers of the cluster to check the health status and inform the other components of the cluster in case one of the servers goes down and starts the failover process.

Pacemaker

Pacemaker is an open source high availability resource manager. The task of Pacemaker is to keep the configuration of all the resources of the cluster and the relations between the servers and resources. For example if we need to setup a VIP (virtual IP), mount a filesystem or start a service on the active node of the cluster, pacemaker will setup all the resources assigned to the server in the order we specify on the configuration to ensure all the services will be started correctly.

Resource Agents

They’re just Scripts that manages different services. That scripts are based on the OCF standard: http://opencf.org/home.html The system comes already with some scripts, where most of the time will be enough for typical cluster setups, but of course that’s possible to develop a new one depending on your needs and requirements.

pcmk-stack

So after this small introduction about the cluster components, let’s get started with the configuration:

Corosync configuration

– Install package dependencies:


# aptitude install corosync pacemaker

– Generate a private key to ensure the authenticity and privacy of the messages sent between the nodes of the cluster:


# corosync-keygen –l

NOTE: This command will generate the private key on the path: /etc/corosync/authkey copy the key file to the other server.

– Edit /etc/corosync/corosync.conf:


# Please read the openais.conf.5 manual page

totem {
version: 2

# How long before declaring a token lost (ms)
token: 3000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join: 60

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600

# Turn off the virtual synchrony filter
vsftype: none

# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Enable encryption
secauth: on

# How many threads to use for encryption/decryption
threads: 0

# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: active

interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.55.71.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

nodelist {
node {
ring0_addr: nfs1-srv
nodeid: 1
}
node {
ring0_addr: nfs2-srv
nodeid: 2
}
}

amf {
mode: disabled
}

quorum {
# Quorum for the Pacemaker Cluster Resource Manager
provider: corosync_votequorum
expected_votes: 1
}

service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}

aisexec {
user: root
group: root
}

logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}

 

Pacemaker configuration

– Disable the quorum policy, since we need to deploy a 2-node configuration:

# crm configure property no-quorum-policy=ignore

– Setup the VIP resource of the cluster:


# crm configure primitive p_ip_nfs ocf:heartbeat:IPaddr2 params ip="10.55.71.21" cidr_netmask="24" nic="eth0" op monitor interval="30s"

– Setup the init script for the NFS server:

# crm configure primitive p_lsb_nfsserver lsb:nfs-kernel-server op monitor interval="30s"

NOTE: The nfs-kernel-server init script will be managed by the cluster, so disable the service to start it at boot time using update-rc.d utility:

# update-rc.d -f nfs-kernel-server remove

 

– Configure the mount point for the NFS export:

# crm configure primitive p_fs_nfs ocf:heartbeat:Filesystem params device="/dev/mapper/nfs1" directory="/mnt/nfs" fstype="ext3" op start interval="0" timeout="120" op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" op stop interval="0" timeout="240"

– Configure a resource group with the nfs service, the mountpoint and the VIP:

# crm configure group g_nfs p_fs_nfs p_lsb_nfsserver p_ip_nfs meta target-role="Started"

 

– Prevent healthy resources from being moved around the cluster configuring a resource stickiness:

# crm configure rsc_defaults resource-stickiness=200

 

Check cluster status

– Check the status of the resources of the cluster:


# crm status
Last updated: Wed Jun 3 21:44:29 2015
Last change: Wed Jun 3 16:56:15 2015 via crm_resource on nfs1-srv
Stack: corosync
Current DC: nfs1-srv (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
3 Resources configured

Online: [ nfs1-srv nfs2-srv ]

Resource Group: g_nfs
p_lsb_nfsserver (lsb:nfs-kernel-server): Started nfs2-srv
p_ip_nfs (ocf::heartbeat:IPaddr2): Started nfs2-srv
p_fs_nfs (ocf::heartbeat:Filesystem): Started nfs2-srv

 

Cluster failover

– If resources are in nfs2-srv and we want to failover to nfs1-srv:

# crm resource move g_nfs nfs1-srv

– Remove all constraints created by the move command:

# crm resource unmove g_nfs

 

Resulting configuration


# crm configure show
node $id="1" nfs1-srv
node $id="2" nfs2-srv
primitive p_fs_nfs ocf:heartbeat:Filesystem \
params device="/dev/mapper/nfs-part1" directory="/mnt/nfs" fstype="ext3" options="_netdev" \
op start interval="0" timeout="120" \
op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
op stop interval="0" timeout="240"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
params ip="10.55.71.21" cidr_netmask="24" nic="eth0" \
op monitor interval="30s"
primitive p_lsb_nfsserver lsb:nfs-kernel-server \
op monitor interval="30s"
group g_nfs p_lsb_nfsserver p_ip_nfs \
meta target-role="Started"
colocation c_nfs_on_fs inf: p_lsb_nfsserver p_fs_nfs
order o_volume_before_nfs inf: p_fs_nfs g_nfs:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="200"

 

References

https://wiki.ubuntu.com/ClusterStack/Natty
http://clusterlabs.org/quickstart-ubuntu.html
http://clusterlabs.org/doc/

This post is a second part of the series of post High available NFS server, find the first part here.

High available NFS server: Setup iSCSI & multipath

Introduction

On this series of post I’ll explain how to setup a high available and redundant NFS cluster using iSCSI with DM-Multipath and Corosync & Pacemaker to manage the cluster and the resources associated. The objective of this scenario it’s create a redundant and fault tolerant NFS storage with automatic failover, to ensure the maximum availability of the NFS exports most of the time.

For this environment I’ve used two servers running Ubuntu 14.04.2 LTS with two NICs configured on each server, one to provide the NFS service to the clients and another one to connect with the iSCSI SAN network. For the iSCSI SAN storage device, I’ve already setup two physical adapters and two network interfaces for each adapter for redundant network access and provide two physical paths to the storage system. Both NFS servers will have attached the LUN device using a different InitiatorName and will have setup the device mapper multipathing (DM-Multipath), which allows you to configure multiple I/O paths between server nodes and storage arrays into a single device. These I/O paths are physical SAN connections that can include separate cables, switches, and controllers, so basically It is as if the NFS servers had a single block device.

iscsi-multipathing

The cluster software used is Corosync and the resource manager Pacemaker, where Pacemaker will be the responsible to assign a VIP (virtual ip address), mount the file system from the block device and starts the nfs service with the specific exports for the clients on the active node of the cluster. In case of failure of the active node of the cluster the resources will be migrated to the passive node and the services will continue to operate as if nothing had happened.

This post specifically will cover the configuration part of the iSCSI initiator for both NFS servers and the configuration for the device mapper multipathing, to see the configuration for the cluster with corosync and pacemaker check the second part: http://opentodo.net/2015/06/high-available-nfs-server-setup-corosync-pacemaker/

So let’s get started with the setup!

iSCSI initiator configuration

– Install dependencies:

# aptitude install multipath-tools open-iscsi

Server 1

– Edit configuration file /etc/iscsi/initiatorname.iscsi:

InitiatorName=iqn.1647-03.com.cisco:01.vdsk-nfs1

Server 2

– Edit configuration file /etc/iscsi/initiatorname.iscsi:

InitiatorName=iqn.1647-03.com.cisco:01.vdsk-nfs2

NOTE: initiator identifiers on both servers are different but they are associated with the same LUN device.

– Runs a discovery on iSCSI targets:

# iscsiadm -m discovery -t sendtargets -p 10.54.61.35
# iscsiadm -m discovery -t sendtargets -p 10.54.61.36
# iscsiadm -m discovery -t sendtargets -p 10.54.61.37
# iscsiadm -m discovery -t sendtargets -p 10.54.61.38

– Connect and login with the iSCSI target:

# iscsiadm -m node -T iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.a -p 10.54.61.35 --login
# iscsiadm -m node -T iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.a -p 10.54.61.36 --login
# iscsiadm -m node -T iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.b -p 10.54.61.37 --login
# iscsiadm -m node -T iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.b -p 10.54.61.38 --login

– Check the sessions established with the iSCSI SAN device:

# iscsiadm -m node
10.54.61.35:3260,1 iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.a
10.54.61.36:3260,2 iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.a
10.54.61.37:3260,1 iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.b
10.54.61.38.38:3260,2 iqn.2054-02.com.hp:storage.msa2012i.0390d423d2.b

– At this point the block devices should be available on both servers like a local attached devices, you can check it simply running fdisk:

# fdisk -l

Disk /dev/sdb: 1000.0 GB, 1000000716800 bytes
255 heads, 63 sectors/track, 121576 cylinders, total 1953126400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  1953118439   976559188+  83  Linux

Disk /dev/sdc: 1000.0 GB, 1000000716800 bytes
255 heads, 63 sectors/track, 121576 cylinders, total 1953126400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1              63  1953118439   976559188+  83  Linux

In my case /dev/sda is the local disk for the server and /dev/sdb and /dev/sdc corresponds to the iSCSI block devices (one device for each adapter). Now We need to setup a device mapper multipath for these two devices, /dev/sdb and /dev/sdc, so in case one of the adapter fails the LUN device will continue working in our system and multipath will switch the used disk for our block device.

Multipath configuration

– We need first to retrieve and generate a unique SCSI identifier to configure on the multipath configuration, running the following command for one of the iSCSI devices:

# /lib/udev/scsi_id --whitelisted --device=/dev/sdb
3600c0ff000d823e5ed6a0a4b01000000

– Create the multipath configuration file /etc/multipath.conf with the following content:

##
## This is a template multipath-tools configuration file
## Uncomment the lines relevent to your environment
##
defaults {
       user_friendly_names yes
       polling_interval        3
       selector                "round-robin 0"
       path_grouping_policy    multibus
       path_checker            directio
       failback                immediate
       no_path_retry           fail
}
blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
}

multipaths{
        multipath {
	        # id retrieved with the utility /lib/udev/scsi_id
                wwid                    3600c0ff000d823e5ed6a0a4b01000000
                alias                   nfs
        }
}

– Restart multipath-tools service:

# service multipath-tools restart

– Check again the disks available in the system:

# fdisk -l

Disk /dev/sdb: 1000.0 GB, 1000000716800 bytes
255 heads, 63 sectors/track, 121576 cylinders, total 1953126400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  1953118439   976559188+  83  Linux

Disk /dev/sdc: 1000.0 GB, 1000000716800 bytes
255 heads, 63 sectors/track, 121576 cylinders, total 1953126400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1              63  1953118439   976559188+  83  Linux

Disk /dev/mapper/nfs: 1000.0 GB, 1000000716800 bytes
255 heads, 63 sectors/track, 121576 cylinders, total 1953126400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

          Device Boot      Start         End      Blocks   Id  System
/dev/mapper/nfs1              63  1953118439   976559188+  83  Linux

Disk /dev/mapper/nfs-part1: 1000.0 GB, 999996609024 bytes
255 heads, 63 sectors/track, 121575 cylinders, total 1953118377 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Now as you can see we’ve a new block device using the alias setup on the multipath configuration file /dev/mapper/nfs. The disk I’ve partitioned and implemented the filesystem is the block device /dev/mapper/nfs-part1, so you can mount it in your system with the mount utility.

– You can check the health of the multipath block device and check if both devices are operational, running the following command:

# multipath -ll
nfs (3600c0ff000d823e5ed6a0a4b01000000) dm-3 HP,MSA2012i
size=931G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:0:0 sdb 8:16 active ready running
  `- 5:0:0:0 sdc 8:32 active ready running

References

https://help.ubuntu.com/14.04/serverguide/device-mapper-multipathing.html

http://linux.dell.com/files/whitepapers/iSCSI_Multipathing_in_Ubuntu_Server.pdf

This post is a second part of the series of post High available NFS server, find the second part here.

Varnish – Handling bad responses from backend

Sometimes servers and applications fails, sooner rather than later, that’s because part of our work is to design and continuously planning a robust and highest reliable architecture for our application to ensure his availability most of the time. Depending on the application, software used, architecture, environment we’ve different challenges and strategies, so we’re depending on multiple factors, but in that post I’ll focus in one particular piece for architectures using web servers with Varnish that can help us to improve a bit more the available with a couple of tweaks. As you know Varnish is an HTTP accelerator which is used to cache dynamic content from web servers, acting as a proxy between the client and the original web server. That’s not the objective of this post to focus on the functionality and configuration of Varnish, so you can find very well documentation on his website: https://www.varnish-cache.org/docs.

One of the features of Varnish is the support of Saint and Grace mode, both features will allow us to handle troubles with our web servers and keep the service online even if our backend servers goes down. Well this is in part true, of course we cannot guarantee the entire service will continue working just with Varnish, but unless we can keep working part of our application.
varnish_stamp_blue200x180
So imagine a website or API service with thousands of requests per second, some of them may be POST, DELETE or PUT requests to submit changes for the application, which that kind of requests cannot be handled in the situations of the backend servers goes down, but in the case of GET requests where the clients wants to obtain the information from the service and Varnish have that particular content on his memory cache, this can be handled perfectly and returning the content to the client even if the backends are not working. Of course this has two things to bear in mind, this requests has to be cached before from Varnish and we’ll have outdated responses to the clients, but that’s better than reply with an error page! As always this behavior it’s useful depending on the requirements and the type of application, but most of the time can can save us requests and keep part of the service working in case of failure, so in my opinion highly recommendable if you can use that.

So let’s get started with the configuration of Varnish:

Edit the VCL definition file, usually located in /etc/varnish/default.vcl and edit the following directives:

sub vcl_recv {
…

if (req.backend.healthy) {
set req.grace = 30s;
} else {
set req.grace = 6h;
}

…
}

sub vcl_fetch {
…

if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503) {
set beresp.saintmode = 10s;

if (req.request != "POST") {
return(restart);
}
}
set beresp.grace = 6h;

…
}

Let’s see with a bit more in depth what this configuration does. The “vcl_recv” is called when at a request comes from the client, and the purpose of this method is decide what to do with that request. In that configuration we’re saying if the backend servers are alive we’ll keep the content for 30 seconds beyond their TTL “set req.grace = 30s;”, in case the backends becomes unavailable we’ll keep the content for 6h to serve to the clients “req.grace = 6h;”.

The “vcl_fetch” is called when a document has been successfully retrieved from the backend. In case the backend server returns a HTTP error code of 500, 502 or 503, Varnish will not ask that backend again for this object for 10 seconds “set beresp.saintmode = 10s;”.  “return(restart);” restart the HTTP request, this will automatically happen on the next available server, except for POST requests to avoid duplicate form submits, etc… The max_restarts parameter defines the maximum number of restarts that can be issued in VCL before an error is triggered, thus avoiding infinite looping.

set beresp.grace = 6h;” Keep all objects for 6h longer in the cache than their TTL specifies, so even if HTTP objects are expired (they’ve passed their TTL), we can still use them in case all backends goes down.

 

References:

https://www.varnish-software.com/static/book/Saving_a_request.html

AWS Cloudformation: Defining your stack

AWS Cloudformation is a service that allows you to define your infrastructure on AWS in a template. This template it’s just a json file where you can define all the resources you need to create on your infrastructure, this is really useful to keep track of all your changes on the infrastructure under a version control of your choice, rollback changes and replicate your environment in other places in question of minutes.

When you define a template, you can think about the definition of one stack where is a set of logical resources you’ll need to provide a service. For example imagine a typical architecture for a web application, is composed basically by a web layer and database layer. Depending on the size of the project, we’ll need more than one web server to serve the content to the clients so we’ll need a load balancer to distribute the load to the web servers. The web server layer can be setup under an auto scaling groups to scale up or scale down the number of servers depending on the load of our web servers. As far we’ve our basic web stack defined:

– Web server instances.
– Database server instance.
– Auto scaling group for web servers.
– Load balancer.

aws-CloudFormation

So based on the example of the web application, cloudformation allows us to define all these resources in a json file creating a stack and cloudformation will be responsible to create automatically all the resources for us. After the creation of the stack you can update, add or delete more resources modifying the template and updating our stack, it’s possible to protect some resources to be modified or deleted if they are critical for our service creating a stack policy. Now let’s see the basic anatomy of a cloudformation template:

{
"AWSTemplateFormatVersion" : "version date",

"Description" : "JSON string",

"Parameters" : {
set of parameters
},

"Mappings" : {
set of mappings
},

"Conditions" : {
set of conditions
},

"Resources" : {
set of resources
},

"Outputs" : {
set of outputs
}
}

AWSTemplateFormatVersion: Cloudformation template format used, most of the time is used the last one “2010-09-09”.
Description: A small explanation about our stack and all the resources.
Parameters: All the parameters passed to our resources at the creation time of the stack, for example the administrator user and password of the database instance, the number of initial instances to launch, and elastic ip to associate to an ec2 instance, etc…
Mappings: It’s a kind of lookup table where you can store definitions of key:value and retrieve the value using the internal function Fn::FindInMap. This is useful for example in cases we need to launch ec2 instances using different AMIs based on the region the stack is created.
Conditions: Includes statements to conditionally create or associate resources in our stack. For example imagine we’ve a stack definition for a test and a production environment and conditionally creates t2.small ec2 instances for our testing environment or m3.xlarge size for the production environment.
Resources: This is the central part of our template, here are defined all the resources of the stack such as s3 buckets, ec2 instances, load balancers, etc…
Outputs: The values returned by the different resources created, for example the URL of a S3 bucket, the dns record of a load balancer, the elastic ip address associated to an EC2 instance, etc…

The documentation of cloudformation it’s well documented, so understanding the anatomy of a template and following the documentation and some examples is enough to start working with cloudformation. I’ll leave link to my github repository where I’ve defined a small stack for my environment: https://github.com/opentodonet/cloudformation-templates/blob/master/WebServer-Stack.json

Basically this stack creates an EC2 instance with an elastic IP associated, a RDS database instance, two S3 buckets to store backups and logs with a lifecycle, a couple of security groups associated to the web server instance and the rds instance and an IAM role including policies to grant access to EC2, S3 buckets and cloudformation resources to the new EC2 instance. Let’s see with a bit more detail the different resources defined on this stack:

AWS::EC2::SecurityGroup: Creates two security groups, one is for the EC2 instance to give access to http, https and ssh services and the security group for the RDS with access to the entire internal subnet to the port 3306. See how parameters are referenced using the internal function {“Ref” : “VpcId”}, where associates the security groups to the VPC id passed.

  "WebServerSecurityGroup" : {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Enable HTTP, HTTPS and SSH access",
        "VpcId" : {"Ref" : "VpcId"},
        "SecurityGroupIngress" : [
          {"IpProtocol" : "tcp", "FromPort" : "80", "ToPort" : "80", "CidrIp" : "0.0.0.0/0"},
          {"IpProtocol" : "tcp", "FromPort" : "443", "ToPort" : "443", "CidrIp" : "0.0.0.0/0"},
          {"IpProtocol" : "tcp", "FromPort" : "22", "ToPort" : "22", "CidrIp" : "0.0.0.0/0"}
        ]
      }
    },
    "DBEC2SecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Frontend Access",
        "VpcId"            : {"Ref" : "VpcId"},
        "SecurityGroupIngress" : [{
          "IpProtocol" : "tcp",
          "FromPort"   : { "Ref" : "DBPort" },
          "ToPort"     : { "Ref" : "DBPort" },
          "CidrIp"     : "172.16.0.0/16"
        }]
      }
    },

AWS::S3::Bucket: Here are defined the two buckets for logs and backups. The DeletionPolicy ensures if the stack is deleted the s3 buckets will be preserved. AccessControl defines the ACL to access on this bucket, in that case both are private. LifecycleConfiguration allows you to create a lifecycle policy to apply on the bucket, in that case both buckets will remove the files older than 15 or 30 days, but here you can setup to archive the files to AWS Glacier for example.

  "S3BackupBucket" : {
      "Type" : "AWS::S3::Bucket",
      "DeletionPolicy" : "Retain",
      "Properties" : {
        "AccessControl" : "Private",
        "BucketName" : "opentodo-backups",
        "LifecycleConfiguration" : {
          "Rules" : [ {
            "ExpirationInDays" : 15,
            "Status" : "Enabled"
          } ]
        }
      }
    },
    "S3LogBucket" : {
      "Type" : "AWS::S3::Bucket",
      "DeletionPolicy" : "Retain",
      "Properties" : {
        "AccessControl" : "Private",
        "BucketName" : "opentodo-logs",
        "LifecycleConfiguration" : {
          "Rules" : [ {
            "ExpirationInDays" : 30,
            "Status" : "Enabled"
          } ]
        }
      }
    }

AWS::IAM::Role: Allows to make API requests to AWS services without using an access and secret keys, using Temporary Security Credentials. This role creates different policies to give access to S3 buckets backups and logs, ec2 access and cloudformation resources.

  "WebServerRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version" : "2012-10-17",
          "Statement": [ {
            "Effect": "Allow",
            "Principal": {
              "Service": [ "ec2.amazonaws.com" ]
             },
             "Action": [ "sts:AssumeRole" ]
          } ]
        },
        "Path": "/",
        "Policies": [
          { "PolicyName": "EC2Access",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": ["ec2:*","autoscaling:*"],
                "Resource": "*"
              } ]
            }
          },
          { "PolicyName": "S3Access",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": ["arn:aws:s3:::opentodo-backups","arn:aws:s3:::opentodo-backups/*","arn:aws:s3:::opentodo-logs","arn:aws:s3:::opentodo-logs/*"]
              } ]
            }
          },
          { "PolicyName": "CfnAccess",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": ["cloudformation:DescribeStackResource"],
                "Resource": "*"
              } ]
            }
          }
        ]
      }
    },

AWS::IAM::InstanceProfile: references to the IAM role, this is just a container for the IAM role and this allows to assign the role to an EC2 instance.

  "WebServerInstanceProfile": {
      "Type": "AWS::IAM::InstanceProfile",
      "Properties": {
        "Path": "/",
        "Roles": [ {
        "Ref": "WebServerRole"
         } ]
      }
    },

AWS::EC2::Instance: Creates the EC2 instance using the AMI id ami-df1696a8, and assigns the InstanceProfile defined before, in the subnet id subnet-7d59d518 and the instance size and key pairs passed as parameters. The property UserData allows to setup scripts to run in the startup process. This commands passed on the user-data are run by the cloud-init service, which is included on the public AMIs provided by AWS. This user-data setup here installs the package python-setuptools and installs CloudFormation Helper Scripts, which are a set of python scripts to install packages, run commands, create files or start services as part of the cloudformation stack on the EC2 instances. The cfn-init command gets the cloudformation metadata to check what tasks has to run the instance (that’s why we include the policy access to cloudformation:DescribeStackResource on the IAM role before). The cloudformation metadata is setup on the AWS::CloudFormation::Init key, where basically installs some packages including the awscli tool and creates a couple of files, the /root/.my.cnf to access to the RDS instance which is filled using the attributes got after create the RDS instance, and the file /etc/bash_completion.d/awscli for the awscli auto completion. The cfn-signal command on user-data is used to indicate if the EC2 instance have been successfully created or updated, which is handled by the CreationPolicy attribute to wait until the cf-init command has finished, with a timeout of 5 minutes.

  "WebServerEc2Instance" : {
      "Type" : "AWS::EC2::Instance",
        "Metadata" : {
        "AWS::CloudFormation::Init" : {
          "config" : {
            "packages" : {
              "apt" : {
                "nginx" : [],
                "php5-fpm" : [],
                "git" : [],
                "etckeeper" : [],
                "fail2ban" : [],
                "mysql-client" : []
              },
              "python" : {
                "awscli" : []
              }
            },
            "files" : {
              "/root/.my.cnf" : {
                "content" : { "Fn::Join" : ["", [
                  "[client]\n",
                  "user=", { "Ref" : "DBUser" }, "\n",
                  "password=", { "Ref" : "DBPassword" }, "\n",
                  "host=", { "Fn::GetAtt" : [ "DBInstance", "Endpoint.Address" ] }, "\n",
                  "port=", { "Fn::GetAtt" : [ "DBInstance", "Endpoint.Port" ] }, "\n"
                ] ] },
                "mode"  : "000600",
                "owner" : "root",
                "group" : "root"
              },
              "/etc/bash_completion.d/awscli" : {
                "content" : { "Fn::Join" : ["", [
                  "complete -C aws_completer aws\n"
                ] ] },
                "mode"  : "000644",
                "owner" : "root",
                "group" : "root"
              }
            }
          }
        }
      },
      "Properties" : {
        "ImageId" : "ami-df1696a8",
        "InstanceType"   : { "Ref" : "InstanceType" },
        "SecurityGroupIds" : [ {"Ref" : "WebServerSecurityGroup"} ],
        "KeyName"        : { "Ref" : "KeyPair" },
        "IamInstanceProfile" : { "Ref" : "WebServerInstanceProfile" },
        "SubnetId" : "subnet-7d59d518",
        "UserData": {
          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "#!/bin/bash\n",
                "aptitude update\n",
                "aptitude -y install python-setuptools\n",
                "easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz\n",
                "# Install the files and packages from the metadata\n",
                "cfn-init --stack ", { "Ref" : "AWS::StackName" }," --resource WebServerEc2Instance --region ", { "Ref" : "AWS::Region" }, "\n",
                "# Signal the status from cfn-init\n",
                "cfn-signal -e $? ","--stack ", { "Ref" : "AWS::StackName" }," --resource WebServerEc2Instance --region ", { "Ref" : "AWS::Region" }, "\n"
              ]
            ]
          }
        }
      },
      "CreationPolicy" : {
        "ResourceSignal" : {
          "Timeout" : "PT5M"
        }
      }
    }

AWS::EC2::EIPAssociation: Associates the elastic IP passed as parameter to the EC2 instance. The elastic IP must be allocated before on AWS.

  "EIPAssociation" : {
      "Type" : "AWS::EC2::EIPAssociation",
      "Properties" : {
        "InstanceId" : {"Ref" : "WebServerEc2Instance"},
        "EIP" : {"Ref" : "ElasticIP"}
      }
    },

AWS::RDS::DBSubnetGroup: Creates a DB subnet group using the subnet ids defined where the RDS instance will be setup.

  "DBSubnetGroup" : {
      "Type" : "AWS::RDS::DBSubnetGroup",
      "Properties" : {
        "DBSubnetGroupDescription" : "WebServer DB subnet group",
        "SubnetIds" : [ "subnet-058c0560", "subnet-2072c457" ]
      }
    },

AWS::RDS::DBInstance: Creates the RDS instance on the subnet group created before with some properties passed as parameter.

  "DBInstance" : {
      "Type": "AWS::RDS::DBInstance",
      "Properties": {
        "DBInstanceIdentifier" : "WebServerRDS",
        "Engine"            : "MySQL",
        "MultiAZ"           : { "Ref": "MultiAZDatabase" },
        "MasterUsername"    : { "Ref" : "DBUser" },
        "MasterUserPassword": { "Ref" : "DBPassword" },
        "DBInstanceClass"   : { "Ref" : "DBClass" },
        "AllocatedStorage"  : { "Ref" : "DBAllocatedStorage" },
        "DBSubnetGroupName" : { "Ref" : "DBSubnetGroup" },
        "Port"              : { "Ref" : "DBPort" },
        "StorageType" : "gp2",
        "AutoMinorVersionUpgrade" : "true",
        "BackupRetentionPeriod" : 5,
        "PreferredBackupWindow" : "02:30-03:30",
	"PreferredMaintenanceWindow" : "sun:04:30-sun:05:30",
        "VPCSecurityGroups": [ { "Fn::GetAtt": [ "DBEC2SecurityGroup", "GroupId" ] } ]
      }
    },

As I said AWS has a very well documentation, so all that you need you can find on his doc pages and find very useful examples:

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-sample-templates.html
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-init.html

– Github repository with the full template:

https://github.com/opentodonet/cloudformation-templates/

Updates, scripts and other stuff

Hey Guys! This post will be a bit different of the rest, it’s just an update with my last recent works on GitHub and sharing some interesting readings I’ve done. Some time ago since I updated, I uploaded these days some of the work I’ve done on github (some scripts and puppet modules), some of them It was done since some time ago, so It was some recollect of work I did and share it as usually, so it may be useful by someone else. Of course are very welcome proposals for improvements in any of the projects and new ideas as well 🙂

    • New repo sysadmin-scripts : In this repo I added some already existing scripts I had in other repositories (linked as submodules) and uploaded others that I didn’t upload yet. I decided to create this repo to recollect some useful scripts that I use and are very useful in my day to day, so I’ll keep updated adding new ones under my needs.
    • New repo puppet-manifests  : The same idea for the previous repository, but with some modules for puppet. There I upload some manifests I’ve that can be useful for very common environments. I added some submodules as well pointing to some puppet modules required in my manifests. I’ll keep updated adding more modules I’ve for puppet.
    • New script tweet-planet-reports : This is a small script I recently did for the Spanish sysadmin planet planetasysadmin.com It’s a simple script to count the number of contributions done for each blog on the planet, and send to the twitter account. Useful for people who manages rss planets, and wants to know the activity made by the blogs. Things to do, I had though about if it’s included on the reports a performance comparison respect the last report executed for each blog. Maybe if I’ve time… :D:D
    • Improved script check_http_requests : I included the next information from the access log file generated by apache2 / nginx:
      Show top 10 of source IPs.
      Show top 10 of pages requested.
      Show percentage of success and bad responses.
      Show top 5 pages requested per source IP.
    • New Perl module Redis-Interface-Client : This small module it’s just an interface for the existing Redis one, but I include some methods to work with data structures easier, and methods like replace, append or add that allows set a key under certain conditions, just check the documentation to know what does each one. In the future I would like to modify to work with more complex data structures like hash of hashes or array of arrays, I’ll see 😀

These has been my last updates on github, now I would like to share some useful readings I done, I found some of them investigating an issue and other ones just saw on twitter / rss:

  • Good read about the TIME_WAIT tcp state, how works and why you should think before touch the sysctl parameter net.ipv4.tcp_tw_recycle. By the way a good blog to follow 😀

http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

  • Good optimization guide for nginx:

http://blog.zachorr.com/nginx-setup/

  • Good explanation about the differences between kmod and akmod:

https://www.foresightlinux.se/difference-akmod-kmod/

  • Good tutorials for people who is introducing to the Perl Catalyst framework:

http://www.catalystframework.org/calendar/

https://metacpan.org/pod/Catalyst::Manual::Tutorial

Well I hope the info will be useful for you and see you the next time with more interesting stuff!! 😛