This article describes how High Availability (HA) can be configured with Pacemaker & Heartbeat for Apache Stratos. In general this concept can be applied for any server application which needs HA and does not require any data replication. If data replication is needed you may need to consider using DRBD with Pacemaker. First of all we will see what Pacemaker and Heartbeat are and go through a series of steps on configuring those.

What is Pacemaker?

Pacemaker is a Cluster Resource Manager (CRM) which can detect and recover from failures of nodes and resources. It basically can start, stop, check the status of a resource and take decisions for recovering them from failures.

What is a resource? A resource either can be a server application, an IP address or any other software/hardware resource that you can think of. These resources are managed through Resource Agents which is an abstraction layer that Pacemaker make use of to communicate with different types of resources. Out of the box Pacemaker provides Resource Agents for OCF and LSB services. In this example we will be using LSB Resource Agent to manage Apache Stratos as an init.d service.

What is Heartbeat?

Heartbeat is a daemon that provides messaging infrastructure for Pacemaker. It manages the communication between nodes and allows to know the presence of resources in the cluster.

Prerequisites

Oracle VirtualBox or any other virtualization technology
Ubuntu 12.04 server (64-bit) virtual machine image
Pacemaker 1.1.6 or above
Heartbeat 3.0.5 or above

Steps for Configuring Pacemaker & Heartbeat for Apache Stratos:

Start two instances of Ubuntu 12.04 server virtual machines.
Switch to root user: sudo su
Install Pacemaker and Heartbeat
```
apt-get install pacemaker heartbeat
```
Create Heartbeat configuration file at the following location: /etc/ha.d/ha.cf

    # enable pacemaker, without stonith
    crm yes
    # define log file
    logfile /var/log/ha-log
    # warning of soon be dead
    warntime 10
    # declare a host (the other node) dead after:
    deadtime 20
    # dead time on boot (could take some time until net is up)
    initdead 120
    # time between heartbeats
    keepalive 2
    # the nodes
    node node1 # set node1 hostname
    node node2 # set node2 hostname
    # heartbeats, over dedicated replication interface
    ucast eth1 10.186.175.16 # set node1 network-interface and ip address
    ucast eth1 54.211.110.217 # set node2 network-interface and ip address

Create authentication key file and set permissions in one of the hosts:

( echo -ne "auth 1\n1 sha1 "; \
dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
> /etc/ha.d/authkeys
chmod 0600 /etc/ha.d/authkeys

Copy the above authkeys file to each host (/etc/ha.d/authkeys).
Restart heartbeat service: service heartbeat restart

Now check the status of the Pacemaker cluster using CRM, here all nodes in the cluster should be in online state. If not check the heartbeat configuration again.

crm status
============
Last updated: Wed Oct 15 11:25:05 2014
Last change: Wed Oct 15 11:21:51 2014 via crmd on ip-10-186-175-16
Stack: Heartbeat
Current DC: ip-10-186-175-16 (d16ccc5c-2641-42b6-b46a-57a0b32fddc9) - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, unknown expected votes
0 Resources configured.
============
Online: [ ip-10-186-175-16 ip-10-153-165-178 ]

Disable STONITH: crm configure property stonith-enabled=false

Create a Failover IP resource to manage the virtual IP address:

crm configure primitive FAILOVER-IP ocf:heartbeat:IPaddr params ip=192.168.10.20 cidr_netmask="255.255.255.0" op monitor interval=10s

SCP java and Apache Stratos packages to both hosts and extract them under /opt folder.

Create an init.d script for Stratos using following code and update USER, JAVA_HOME and PRODUCT_HOME variable values:

```` https://gist.github.com/imesh/5256272cd71b74a06581

#!/bin/sh

BEGIN INIT INFO

Provides: stratos

Required-Start: $local_fs $remote_fs $network $syslog $named

Required-Stop: $local_fs $remote_fs $network $syslog $named

Default-Start: 2 3 4 5

Default-Stop: 0 1 6

X-Interactive: true

Short-Description: Start/stop stratos server

END INIT INFO

USER=“vagrant” PRODUCT_NAME=“stratos” JAVA_HOME=“/opt/jdk1.7.0_60” PRODUCT_HOME=“/opt/apache_stratos_4.1.0_SNAPSHOT” PID_FILE=“${PRODUCT_HOME}/wso2carbon.pid” CMD=“${PRODUCT_HOME}/bin/stratos.sh”

LSB exit codes:

ftp://ftp.nomadlinux.com/nomad-2/dist/heartbeat-1.2.5/include/clplumbing/lsb_exitcodes.h

LSB_EXIT_OK=0 LSB_EXIT_GENERIC=1 LSB_EXIT_EINVAL=2 LSB_EXIT_ENOTSUPPORTED=3 LSB_EXIT_EPERM=4 LSB_EXIT_NOTINSTALLED=5 LSB_EXIT_NOTCONFIGED=6 LSB_EXIT_NOTRUNNING=7

is_service_running() { if [ -e ${PID_FILE} ]; then PID=cat ${PID_FILE} if ps -p $PID >&- ; then # service is running return 0 else # service is stopped return 1 fi else # pid file was not found, may be server was not started before return 1 fi }

Status the service

status() { is_service_running service_status=$?

if [ "${service_status}" -eq 0 ]; then
    echo "${PRODUCT_NAME} service is running"
    return ${LSB_EXIT_OK}
    elif [ "${service_status}" -eq 1 ]; then
    echo "$PRODUCT_NAME service is stopped"
    return ${LSB_EXIT_OK}
    else
    echo "$PRODUCT_NAME service status is unknown"
    return ${LSB_EXIT_GENERIC}
    fi
}


# Start the service
start() {
    if is_service_running; then
    echo "${PRODUCT_NAME} service is already running"
    return ${LSB_EXIT_OK}
    fi


    echo "starting ${PRODUCT_NAME} service..."
    su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} start"


    is_service_running
    service_status=$?
    while [ "$service_status" -ne "0" ]
    do
    sleep 1;
    is_service_running
    service_status=$?
    done


    echo "${PRODUCT_NAME} service started"
    return ${LSB_EXIT_OK}
}


# Restart the service
restart() {
    echo "restarting ${PRODUCT_NAME} service..."
    su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} restart"
    echo "${PRODUCT_NAME} service restarted"
    return ${LSB_EXIT_OK}
}


# Stop the service
stop() {
    if ! is_service_running; then
    echo "${PRODUCT_NAME} service is already stopped"
    return ${LSB_EXIT_OK}
    fi


    echo "stopping ${PRODUCT_NAME} service..."
    su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} stop"


    is_service_running
    service_status=$?
    while [ "$service_status" -eq "0" ]
    do
    sleep 1;
    is_service_running
    service_status=$?
    done


    echo "${PRODUCT_NAME} service stopped"
    return ${LSB_EXIT_OK}
}
### main logic ###
case "$1" in
start)
    start
;;
stop|graceful-stop)
    stop
;;
status)
    status
;;
restart|reload|force-reload)
    restart
;;
*)
echo $"usage: $0 {start|stop|graceful-stop|restart|reload|force-reload|status}"
exit 1
esac
exit $?
````

Create a CRM resource for stratos:

crm configure primitive STRATOS lsb::stratos op monitor interval=15s

Create a CRM resource group and add FAILOVER-IP and STRATOS resources:

crm configure group FAILOVER-IP-RESOURCE-GROUP FAILOVER-IP STRATOS

Configure a colocation dependency between FAILOVER-IP and STRATOS. This will make sure that both FAILOVER-IP and STRATOS resources will stay in the same host.

crm configure colocation FAILOVER-IP-RESOURCE-GROUP-COLOCATION inf: FAILOVER-IP STRATOS

Menu

Providing High Availability for Stratos with Pacemaker & Heartbeat