Lessons in Elasticsearch

Elasticsearch and EC2 are well suited for each other in production deployments. However, there are a couple of considerations to heed when setting up your Elasticsearch cluster.

Below are three high-level lessons that our team learned deploying Elasticsearch to Amazon’s Elastic Compute Cloud.

If you have a basic understanding of Elasticsearch then read on. If you need more details check this out:  ‘Getting Started’ chapter of the Elasticsearch guide.

The Basics

Elasticsearch is a clustering indexer used for natural language processing and advanced search capabilities built on top of Lucene. It has a simple RESTful API that allows quick, basic search functionality and advanced capabilities for complex filtering of queries and indexing documents.

Elasticsearch is the search engine of choice for thousands of production environments largely due to its scalability and schemaless document storage. The largest production environments are on the order of petabytes. Elasticsearch is most commonly used to provide simple search functionality to an existing database or document store. It also provides highly accessible and redundant server logs (usually with the prototypical ‘ELK Stack’ – Elasticsearch, Logstash and Kibana.)

  • Leverage the Elasticsearch AWS Cloud Plugin

Elasticsearch nodes practically cluster themselves. Simply provide a cluster name (or leave it undefined to use the default, ‘elasticsearch’) and the nodes will advertise their presence using broadcast and multicast packets over the network. However, like most cloud providers, Amazon Web Services does not support broadcast or multicast – so what’s a node to do?

Enter the AWS Cloud Plugin for Elasticsearch, a plugin that works around AWS’ lack of multi- & broadcast support using the AWS API through the metadata URL.  The bad news is that its main logic simply queries for a list of all instances in AWS & attempts to connect to their port 9300, which causes issues with environments larger than 20 instances, so be sure to tag your instances and filter your list via that tag! The AWS Cloud Plugin can also automatically generate node attributes related to your node in EC2, such as what availability zone the node is in. This can help ensure that your shard allocation is even across AZs, giving your cluster resilience. You can also take snapshots of your indices to S3, but if you do so be sure to utilize IAM roles to prevent having to hardcode credentials into your configuration.

  • Utilizing Elastic Load Balancers & Auto Scaling Groups effectively

It’s a brilliant idea to toss your ElasticSearch nodes into an AutoScaling Group (ASG) behind an Elastic Load Balancer (ELB). However, it’s important to be cautious with this. In our experience, CPU load will push your ASGs to higher heights. You may need to watch your memory too, but as long as you have most of the memory on the machine allocated to ElasticSearch and set it to lock you should be good. One can easily attach EBS volumes to an instance in EC2 to bump up your storage.

So what happens when you need to scale down? If your data nodes are in an ASG, be extremely careful to ensure that terminations will not cause the primary (as well as all replicas of any individual) shard to go missing. This will cause data loss.

At RightBrain Networks, we roll updates of an ASG with WaitConditions, a function of CloudFormation, to ensure that the cluster is healthy before terminating and bringing up new instances. We do not use ELB health check terminations against any of the data or master nodes. Instead, we use per-host checks internal to each host and attempt to rectify problems through more complex logic than simple terminations (notably service restarts, manipulation of the replica count, etc.).

  • Set up your master-capable nodes correctly

ElasticSearch elects a node to serve the role of the ‘master’ from a pool of ‘master-capable’ nodes. This allows you to keep track of which data nodes are present in a cluster, and assist in finding shards.

Because AWS is intended to be more transient than a traditional datacenter with ElasticSearch on bare metal, you should prepare for failure up front. ElasticSearch should always be set up with an odd number of master-capable nodes such that, when a majority are present, you can ensure that a split brain scenario does not occur. That ‘quorum’ number is set by the ‘minimum_master_nodes’ setting under the ‘ec2’ discovery section (if you are using the Cloud AWS Plugin), and is best set by the following formula: (N/2)+1, where N is the number of master capable nodes (that is to say, half of the master capable nodes plus 1 constitutes a quorum.)

Since one is guaranteed access to at least 3 Availability Zones within a region when setting up a CloudFormation stack, we opt to use all 3 AZs & use 3 master capable nodes, with ‘minimum_master_nodes’ set to 2 – that way, if an AZ goes down, we still have a majority to allow the cluster to operate. In our experience, if your cluster does not scale or rearrange often, and the service’s uptime is consistent & high, these machines will be left with very little to do besides keep track of the cluster from time to time – we have implemented master-capable nodes as ‘t2.medium’ and ‘t2.small’ instance types very successfully, as opposed to more costly non-bursting instance types.

Conclusion

Elasticsearch and Amazon’s Elastic Compute Cloud are generally a good fit. However, their set up is a bit more involved than a bare metal deployment. That said, once you’ve gotten to a good place with your ElasticSearch cluster on EC2, you’ll find that it needs little to no maintenance or monitoring.

I hope these suggestions based on our experiences are helpful for both fledgling and established Elasticsearch clusters. As always, if you need further expertise do not hesitate to drop us a line!

Glossary

  • index – a searchable collection of documents
  • shard – a single piece of the index
  • primary shard – the authoritative copy of a shard
  • replica shard – a mirror or backup copy of a shard
  • master node – the node currently keeping track of all other nodes
  • master-capable node – nodes that are capable of voting for the master role
  • data node – capable of storing shards
  • CloudFormation – AWS’ Infrastructure As Code platform