Elasticsearch 2.3.1 cluster with docker

How to deploy an Elasticsearch 2.3.1 cluster using docker


We will deploy:

  1. Elasticsearch 2.3.1
  2. Two nodes
  3. Authentication enabled via NGINX proxy
  4. Persistent data to each node local file system

To follow this tutorial you must have docker installed on your servers or VMs. You can find instructions to do so here.
I'll also assume you can run docker without sudo and that you are using Debian or one of its derivatives.

Official Elasticsearch cluster documentation can be found here.

Step One:

Get the IPs of the two servers running the following command on each one:

ifconfig eth0:1 | grep "inet addr" | cut -d: -f2 | awk '{print $1}'  

(If you are using a different network interface other than eth0:1, make sure to modify the above command accordingly)

Then export them on every machine:

yourUsername@yourServerName1:~$ export node1=192.168.206.177  
yourUsername@yourServerName2:~$ export node2=192.168.207.165  

(Make sure to change the IP addresses, to match your servers ones, before exporting.)

In a production environment also make sure each of the servers is accessible by way of resolvable DNS or hostnames. Either set up '/etc/hosts' to reflect this configuration or configure your DNS names.

Step Two:

For this blog post I'll use /home/docker/elastic directory. Create the directory on both servers:

yourUsername@yourServerName2:/mkdir -p ~/docker/elasticsearch  
yourUsername@yourServerName1:/mkdir -p ~/docker/elasticsearch  
Step Three

On yourServerName1, start Elasticsearch docker container with:

docker run --name="esNode1" -p 9300:9300 --hostname="yourServerName1" \  
--add-host yourServerName2@yourDomain.com:192.168.207.165 \
-v "$PWD/docker/elasticsearch/data":/usr/share/elasticsearch/data \
-v "$PWD/docker/elasticsearch/plugins":/usr/share/elasticsearch/plugins \
-d elasticsearch:2.3.1 \
-Des.node.name="esNode1" \
-Des.network.host=_eth0:ipv4_ \
-Des.network.bind_host=0.0.0.0 \
-Des.cluster.name=yourClusterName \
-Des.network.publish_host=192.168.206.177 \
-Des.discovery.zen.ping.multicast.enabled=false \
-Des.discovery.zen.ping.unicast.hosts=192.168.207.165 \
-Des.discovery.zen.ping.timeout=3s \
-Des.discovery.zen.minimum_master_nodes=1 \
--env="ES_HEAP_SIZE=8g" 

and on yourServerName2, start Elasticsearch docker container with:

docker run --name="esNode2" -p 9300:9300 --hostname="yourServerName2" \  
--add-host yourServerName1@yourDomain.com:192.168.206.177 \
-v "$PWD/docker/elasticsearch/data":/usr/share/elasticsearch/data \
-v "$PWD/docker/elasticsearch/plugins":/usr/share/elasticsearch/plugins \
-d elasticsearch:2.3.1 \
-Des.node.name="esNode2" \
-Des.network.host=_eth0:ipv4_ \
-Des.network.bind_host=0.0.0.0 \
-Des.cluster.name=yourClusterName \
-Des.network.publish_host=192.168.207.165 \
-Des.discovery.zen.ping.multicast.enabled=false \
-Des.discovery.zen.ping.unicast.hosts=192.168.206.177 \
-Des.discovery.zen.ping.timeout=3s \
-Des.discovery.zen.minimum_master_nodes=1 \
--env="ES_HEAP_SIZE=8g"

The --add-host is used to edit /etc/hosts inside the mongoDB docker container, so we can use hostnames instead of IPs. In a production environment these entries can be resolved via DNS, so those lines could be skipped.

-v lines let us choose where to mount locally elasticsearch docker container data and plugin directories. Those are what give you persistence outside docker container.

-d line let us choose which image and which version to pull from Docker Hub.

-Des.* lines are all configuration options passed to Elasticsearch.
Some are self explanatory, such as -Des.node.name="esNode2" and -Des.cluster.name=yourClusterName, but others might require further explanation.
Check out the following links to learn more about network settings and discovery.

A good rule of thumb is to set heap size to half of your memory, but don't cross 32GB if you are lucky enough to have that many. Also disable swap on your servers. Learn why from the official Elasticsearch documentation about heap and swap.

To disable swap:

sudo swapoff -a  

and also edit /etc/fstab/ and comment out all lines where swap is present.
If disabling swap completely is not an option, there are other techniques described in the link above, that might work for you.

We have now a fully working Elasticsearch 2.3.1, but it is totally exposed and unprotected, meaning that everyone can, not only, access your data, but also erase them all with ease.
In the next steps we are going see how to set up access control for our cluster, using NGINX as a proxy with basic authentication.

Step Four

If you don't already have nginx installed, do it now on both server:

sudo apt-get install nginx  

We need to generate 2 password files one for standard users and another one for administrators. We can do this wiht openssl, but we are limited to 8 characters passwords, or we accomplish this with apache2-utils and have no such limit. Choose what's best for you. I used the latter.
Also remember to pick two meaningful usernames, for example stdusers and admins.

If you went the openssl route:

printf "stduser:$(openssl passwd -crypt sup3rs3cr3t)" > users_password  
printf "admin:$(openssl passwd -crypt ub3rs3cr3t)" > admins_password  

change user and group to root and move them to /etc/nginx/conf.d/:

sudo chown root:root users_password admins_password  
sudo mv users_password admins_password /etc/nginx/conf.d/  

else you'll need to install apache2-utils first, if it's not already installed:

sudo apt-get install apache2-utils  

and then generate the password files:

sudo htpasswd -c /etc/nginx/conf.d/search_users.htpasswd user  
sudo htpasswd -c /etc/nginx/conf.d/search_admins.htpasswd admin  
Step Five

Let's create on each server an NGINX configuration file and open it with an editor. I use vim:

sudo vim /etc/nginx/sites-available/elastic  

on yourServerName1 then insert those lines:

upstream elasticsearch {  
  server 172.17.0.2:9200;
  server 192.168.207.165:9200;
  keepalive 15;
}

server {  
    listen 8081 default_server;
    listen [::]:8081 default_server ipv6only=on;

    server_name yourServerName1.yourDomain.com;

    location / {
      return 403;
    }

    location ~* /[a-zA-Z0-9_]*[a-zA-Z0-9,_]*/(health|_health|state|stats) {
      return 405;
    }

    location ~* (/_search|/_analyze|_mget)$ {
      if ( $request_method !~ ^(GET|HEAD)$ ) {
        return 405;
      }

      if ( $request_uri = /_health ) {
        return 405;
      }


      if ( $request_uri = /_bulk ) {

        return 405;
      }

      auth_basic "Elasticsearch Users";
      auth_basic_user_file /etc/nginx/conf.d/search_users.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

server {  
    listen 8082 default_server;
    listen [::]:8082 default_server ipv6only=on;

    server_name yourServerName1.yourDomain.com;

    location / {
      auth_basic "Elasticsearch Admins";
      auth_basic_user_file /etc/nginx/conf.d/search_admins.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

on yourServerName2 insert those lines instead:

upstream elasticsearch {  
  server 172.17.0.2:9200;
  server 192.168.207.165:9200;
  keepalive 15;
}

server {  
    listen 8081 default_server;
    listen [::]:8081 default_server ipv6only=on;

    server_name yourServerName2.yourDomain.com;

    location / {
      return 403;
    }

    location ~* /[a-zA-Z0-9_]*[a-zA-Z0-9,_]*/(health|_health|state|stats) {
      return 405;
    }

    location ~* (/_search|/_analyze|_mget)$ {
      if ( $request_method !~ ^(GET|HEAD)$ ) {
        return 405;
      }

      if ( $request_uri = /_health ) {
        return 405;
      }


      if ( $request_uri = /_bulk ) {

        return 405;
      }


      auth_basic "Elasticsearch Users";
      auth_basic_user_file /etc/nginx/conf.d/search_users.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

server {  
    listen 8082 default_server;
    listen [::]:8082 default_server ipv6only=on;

    server_name yourServerName2.yourDomain.com;

    location / {
      auth_basic "Elasticsearch Admins";
      auth_basic_user_file /etc/nginx/conf.d/search_admins.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

As you can see the NGINX configuration file are pretty similar. They only differ in server_name and in the upstream section.
Now we need to enable on both servers the configurations we just created:

sudo ln -s /etc/nginx/sites-available/elastic /etc/nginx/sites-enabled/elastic  

and then reload the NGINX configuration, again on both server:

sudo service nginx reload  

We did just set up a simple load balancer, thaks to the upstream directive, and allowing access only to authenticated users, even with different roles and permissions.

On port 8081 we are only allowing GET and HEAD requests to endpoint containing: _search, _analyze, _mget. In other words we are only allowing methods to retrieve data, but not to modify existing, deleting or inserting new data. That's what regular entitled users will use.

On port 8082 we are allowed to do anything we'd like to. That's, after all, the admin account we'll use to manage our cluster.

Step Six

It is usually handy to have an upstart script or something equivalent to manage your docker container instances.

On node1 (the one running on yourServerName1):

sudo vim /etc/init/esNode1.conf  

and insert those lines:

description "Elasticsearch 2.3.1 node 1"  
author "yourMailUsername@yourDomain.com"  
start on filesystem and started docker  
stop on runlevel [!2345]  
respawn  
script  
    /usr/bin/docker start -a es1Node1
end script  

and on node2 (the one running on yourServerName2):

sudo vim /etc/init/esNode2.conf  

and insert those lines:

description "Elasticsearch 2.3.1 node 2"  
author "yourMailUsername@yourDomain.com"  
start on filesystem and started docker  
stop on runlevel [!2345]  
respawn  
script  
    /usr/bin/docker start -a esNode2
end script  

With those upstart scripts in place, you can issue commands in the form:

sudo service serviceName status|start|stop|restart  

So, for example, if we would like to know whether or not the Elasticsearch is up and running on yourServerName1, we'd type:

sudo service esNode1 status  

and if it is up and running it will output somethinng like:

esNode1 start/running, process 23163  

Note that if you already had your docker container running when you created the upstart scripts, you will need to manually stop the docker containers on with:

yourUsername@yourServerName1:~$ docker stop es1Node  
yourUsername@yourServerName2:~$ docker stop es2Node  

and then starting them with the upstart script:

yourUsername@yourServerName1:~$ sudo service esNode1 start  
yourUsername@yourServerName2:~$ sudo service esNode2 start  

From this moment on, upstart will be responsible, to keep your docker container running, and restarting them on server restarts.

Conclusion

We have now a fully operational Elasticsearch 2.3.1 cluster running with docker! Take a tour of the official documentation to learn how to create indexes and mappings and then import or insert some data.

In an upcoming post we'll explore how to create a very fast autocomplete box using Elasticsearch.