1

We have an elastic search cluster of 3 nodes of the following configurations

 #Cpu Cores     Memory(GB)   Disk(GB)    IO Performance 
    36            244.0        48000        very high

The machines are in 3 different zones namely eu-west-1c,eu-west-1a,eu-west-1b.

Each elastic search instance is being allocated 30GB of heap space.

we are using the above cluster for running aggregations only. The cluster has replication factor of 1 and all the string fields are not analyzed , doc_values is true for all the fields.

We are pumping data into this cluster running 6 instances of logstash in parallel ( having a batch size of 1000)

When more instances of logstash are started one by one the nodes of the ElasticSearch cluster starts throwing out of memory error.

What could be the possible optimizations to speed up bulk indexing rate on the cluster?= Will presence of nodes of cluster in the same zone increase bulk indexing? Will adding more nodes in the cluster help ?

Couple of steps taken so far

Increase the bulk queue size from 50 to 1000
Increase refresh interval from 1 seconds to 2 minutes
Changed segments merge throttling to none ( https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing- performance.html)

We cannot set the replication factor to 0 due to inconsistency involved if one of the nodes goes down.

STandon
  • 76
  • 7
  • Where do those LS instances live? On the machines where the ES nodes are? – Andrei Stefan Nov 03 '16 at 15:21
  • On what basis have you increased the bulk queue size from 50 to 1000? That's probably a likely reason for OOM. – Andrei Stefan Nov 03 '16 at 15:21
  • I agree with the refresh_interval, but I don't agree with merge throttling, unless you tell me what kind of disks the nodes have. – Andrei Stefan Nov 03 '16 at 15:25
  • i suggest you can increase JVM allocated to 60 gb(max value as per elastic). To efficiently tune bulk queue size, you can consider thread pool size for bulk indexing queuing up batch requests.https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html – user3775217 Nov 03 '16 at 15:25
  • @user3775217 your suggestions are **against what Elastic recommends**. The maximum recommended heap size is **around 30GB** and the bulk queue size should stay as is and the **indexing process should be adjusted accordingly**. – Andrei Stefan Nov 03 '16 at 15:29
  • sorry i forgot that number, thanks for correcting. i assumed we can modify both ways. – user3775217 Nov 03 '16 at 15:34
  • @andrei the nodes are on EC2 and have a rotating disk due to constraints in budget we cannot go for ssd – STandon Nov 04 '16 at 09:48
  • essentially bulk size of 50 we were getting rejected exception we increased it by 100 gradually and finally arrived at a value of 1000. Our elastic document is 5K each in size and in one batch we are sending 1000 documents. – STandon Nov 04 '16 at 09:50
  • Imagine a queue that is full (1000 bulk requests in it). Each with 5MB of data. You get a heap usage on average of 5GB only for that queue. Please, provide the complete OOM error message and stack trace. – Andrei Stefan Nov 04 '16 at 09:52

0 Answers0