I would appreciate if someone could suggest the optimal number of shards per ES node for optimal performance or provide any recommended way to arrive at the number of shards one should use, given the number of cores and memory foot print.
7 Answers
I'm late to the party, but I just wanted to point out a couple of things:
- The optimal number of shards per index is always 1. However, that provides no possibility of horizontal scale.
- The optimal number of shards per node is always 1. However, then you cannot scale horizontally more than your current number of nodes.
The main point is that shards have an inherent cost to both indexing and querying. Each shard is actually a separate Lucene index. When you run a query, Elasticsearch must run that query against each shard, and then compile the individual shard results together to come up with a final result to send back. The benefit to sharding is that the index can be distributed across the nodes in a cluster for higher availability. In other words, it's a trade-off.
Finally, it should be noted that any more than 1 shard per node will introduce I/O considerations. Since each shard must be indexed and queried individually, a node with 2 or more shards would require 2 or more separate I/O operations, which can't be run at the same time. If you have SSDs on your nodes then the actual cost of this can be reduced, since all the I/O happens much quicker. Still, it's something to be aware of.
That, then, begs the question of why would you want to have more than one shard per node? The answer to that is planned scalability. The number of shards in an index is fixed. The only way to add more shards later is to recreate the index and reindex all the data. Depending on the size of your index that may or may not be a big deal. At the time of writing, Stack Overflow's index is 203GB (see: https://stackexchange.com/performance). That's kind of a big deal to recreate all that data, so resharding would be a nightmare. If you have 3 nodes and a total of 6 shards, that means that you can scale out to up to 6 nodes at a later point easily without resharding.

- 16,635
- 27
- 94
- 135

- 232,153
- 36
- 385
- 444
-
Let's say that you are storing log data and each day's worth is stored to a new index. If the query load is such that usually only the indices for recent days are searched, would there be a benefit from not loading the inverted indices of the older indices into memory? – Dan Hook Mar 25 '15 at 15:35
-
1@DanHook: Sorry, but I'm not sure I understand. Elasticsearch only loads into memory what it needs to work with. Each index is at least one lucene index, so if your query doesn't span all the indexes, then the others don't need to be touched. Is that what you're asking? – Chris Pratt Mar 25 '15 at 15:40
-
Pretty much. I was a little surprised by hearing that "the optimal number of shards per node is always 1," so I started trying to come up with exceptions. – Dan Hook Mar 25 '15 at 19:43
-
3Yeah, it just boils down to how many Lucene indexes need to be searched and where they live. A shard is merely a separate Lucene index. Each Elasticsearch index has at least one shard and therefore at least one Lucene index, but if you have 3 shards, for example, that's 3 Lucene indexes that have to be searched independently and then have their result sets combined before doing the final scoring and such. The situation is worse if some of these Lucene indexes reside on the same drive, as then you can only query them serially rather than in parallel. – Chris Pratt Mar 25 '15 at 20:02
-
This advice is not so cut and dry as you are making it sound. If you are reading every shard from disk then you have a bigger problem than shard count. So having more than one shard per node on the same drive location should not typically require serial IO as the OS will be caching the shards that are being used. As your box has many cores and possibly many processors, with many IO lanes to memory, parallel IO in this case should be stellar, and provide better performance. I have been doing this since back when SOLR only supported 1 shard per index, scalability and performance suffered. – AaronM May 18 '16 at 23:58
-
Elasticsearch caches the *results* of queries, not the entire shard(s). A shard could be terabytes, depending on the size of the index, and it's simply not feasible to load the entire thing into RAM. As a result, you very much will encounter disk I/O performance limitations. However, depending on the speed of the drive and the complexity of the queries, including how much of the index must be consumed, you may never notice any significant performance degradation. Nevertheless, it's something to be aware of, which is why I pointed it out. – Chris Pratt May 25 '16 at 14:38
-
Hi Chris, your explanation is nice! For the part "Elasticsearch must run that query against each shard, and then compile the individual shard results together to come up with a final result to send back", could you show me the source of that? I need to cite it in my work :) – Triet Doan Oct 23 '17 at 14:15
-
1@AnhTriet Here's the relevant section from the docs: https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-search.html – Chris Pratt Oct 23 '17 at 16:00
-
May want to update some of your advice now, with the addition of ILM to Elasticsearch - https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html. – slm Jun 25 '20 at 16:00
-
1@slm: That doesn't really change the math. The only thing that's different is that indexes can now be re-sharded, so you rethink your shards-per-index number later. Everything else still applies – Chris Pratt Jun 25 '20 at 16:07
There are three condition you consider before sharding..
Situation 1) You want to use elasticsearch with failover and high availability. Then you go for sharding. In this case, you need to select number of shards according to number of nodes[ES instance] you want to use in production.
Consider you wanna give 3 nodes in production. Then you need to choose 1 primary shard and 2 replicas for every index. If you choose more shards than you need.
Situation 2) Your current server will hold the current data. But due to dynamic data increase future you may end up with no space on disk or your server cannot handle much data means, then you need to configure more no of shards like 2 or 3 shards (its up to your requirements) for each index. But there shouldn't any replica.
Situation 3) In this situation you the combined situation of situation 1 & 2. then you need to combine both configuration. Consider your data increased dynamically and also you need high availability and failover. Then you configure a index with 2 shards and 1 replica. Then you can share data among nodes and get an optimal performance..!
Note: Then query will be processed in each shard and perform mapreduce on results from all shards and return the result to us. So the map reduce process is expensive process. Minimum shards gives us optimal performance
If you are using only one node in production then, only one primary shards is optimal no of shards for each index.
Hope it helps..!
-
6I don't think you understand the concept of replicas. Let's say you set up your index to have 3 shards and 1 replica. That actually means you'll have **6** shards, though only three will ever be actively used at one time. 1 replica means 1 replica for *each* shard. For the safest failover for 3 shards, you'd actually want to 2 replicas, so that you'd have one active shard and replicas of the other two active shards on each node. – Chris Pratt Apr 06 '15 at 19:46
-
i agree with your command @ChrisPratt, which point of above answer contradict with your command?? – BlackPOP Apr 07 '15 at 10:12
-
1You suggested that if the OP has 3 nodes, then they should have 1 primary and 2 replicas. That would result in only one node being utilized because there's only one shard. Meanwhile, the other two nodes would merely get a replica each. That provides failover, but no load-balancing. For a three node setup, the optimal number of shards would 3 and the optimal number of replicas would be 2. That would give you the best performance and failover, but it also would not allow any horizontal expansion without reindexing. – Chris Pratt Apr 09 '15 at 14:17
-
3The OP does not make it clear if read or write performance is more important for their scenario, but an index with 1 primary and 2 replica shards will, as BlackPOP suggests, take full advantage of a 3 node cluster for read operations; see https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html – Dusty Jun 23 '15 at 16:50
-
1
Just got back from configuring some log storage for 10 TB so let's talk sharding :D
Node limitations
Main source: The definitive guide to elasticsearch
HEAP: 32 GB at most:
If the heap is less than 32 GB, the JVM can use compressed pointers, which saves a lot of memory: 4 bytes per pointer instead of 8 bytes.
HEAP: 50% of the server memory at most. The rest is left to filesystem caches (thus 64 GB servers are a common sweet spot):
Lucene makes good use of the filesystem caches, which are managed by the kernel. Without enough filesystem cache space, performance will suffer. Furthermore, the more memory dedicated to the heap means less available for all your other fields using doc values.
[An index split in] N shards can spread the load over N servers:
1 shard can use all the processing power from 1 node (it's like an independent index). Operations on sharded indices are run concurrently on all shards and the result is aggregated.
Less shards is better (the ideal is 1 shard):
The overhead of sharding is significant. See this benchmark for numbers https://blog.trifork.com/2014/01/07/elasticsearch-how-many-shards/
Less servers is better (the ideal is 1 server (with 1 shard)]):
The load on an index can only be split across nodes by sharding (A shard is enough to use all resources on a node). More shards allow to use more servers but more servers bring more overhead for data aggregation... There is no free lunch.
Configuration
Usage: A single big index
We put everything in a single big index and let elasticsearch do all the hard work relating to sharding data. There is no logic whatsoever in the application so it's easier to dev and maintain.
Let's suppose that we plan for the index to be at most 111 GB in the future and we've got 50 GB servers (25 GB heap) from our cloud provider.
That means we should have 5 shards.
Note: Most people tend to overestimate their growth, try to be realistic. For instance, this 111GB example is already a BIG index. For comparison the stackoverflow index is 430 GB (2016) and it's a top 50 site worldwide, made entirely of written texts by millions of people.
Usage: Index by time
When there're too much data for a single index or it's getting too annoying to manage, the next thing is to split the index by time period.
The most extreme example is logging applications (logstach and graylog) which are using a new index every day.
The ideal configuration of 1-single-shard-per-index makes perfect sense in scenario. The index rotation period can be adjusted, if necessary, to keep the index smaller than the heap.
Special case: Let's imagine a popular internet forum with monthly indices. 99% of requests are hitting the last index. We have to set multiple shards (e.g. 3) to spread the load over multiple nodes. (Note: It's probably unnecessary optimization. A 99% hitrate is unlikely in the real world and the shard replica could distribute part of the read-only load anyway).
Usage: Going Exascale (just for the record)
ElasticSearch is magic. It's the easiest database to setup in cluster and it's one of the very few able to scale to many nodes (excluding Spanner ).
It's possible to go exascale with hundreds of elasticsearch nodes. There must be many indices and shards to spread the load on that many machines and that takes an appropriate sharding configuration (eventually adjusted per index).
The final bit of magic is to tune elasticsearch routing to target specific nodes for specific operations.

- 5,301
- 1
- 36
- 57
-
5"thus 64 GB servers top, then scale horizontally" > This is not a valid statement. You _do_ need a 64 GB machine to max out the heap to just **under** 32 GB (to get compressed object pointers), but Elasticsearch and Lucene will heavily benefit from extra memory because it can use it for the file system cache. If you have 256 GB of memory and only one instance of ES running with 30 GB of heap, then the remaining 226 GB of RAM is available to use for the file system and ES will use as much as it can, thus letting it perform _extremely_ well. – pickypg Jul 24 '16 at 00:50
-
True, there are huge dataset where going over 64GB makes sense. They're uncommon and people will have to do their own benchmarking to discover the good settings for them. Generally speaking, 3 x 64 GB servers give 96GB heap + 96 GB cache. That can store tons of data, more than most people will ever have. If you dare having 256GB data on a SINGLE node on purpose, you just sacrified all form of redundancy and you failed at your job. – user5994461 Jul 24 '16 at 14:29
-
I agree with @pickypg. Even with the smaller dataset size, you need to consider how do you use it. For example, if you are heavily using aggregations or doing full-text search, the extra memory will be used by Lucene and operating system to store doc values and cache segments. And of course you need to analyze and fine-tune other config values as well based on your own use cases. – Ritesh Oct 01 '18 at 01:14
It might be also a good idea to have more than one primary shard per node, depends on use case. I have found out that bulk indexing was pretty slow, only one CPU core was used - so we had idle CPU power and very low IO, definitely hardware was not a bottleneck. Thread pool stats shown, that during indexing only one bulk thread was active. We have a lot of analyzers and complex tokenizer (decomposed analysis of German words). Increasing number of shards per node has resulted in more bulk threads being active (one per shard on node) and it has dramatically improved speed of indexing.

- 51
- 1
- 1
-
Just out of curiosity, with how many shards per Node did you end up with? I have a similar setup – sebwebdev Nov 14 '18 at 11:10
Number of primary shards and replicas depend upon following parameters:
- No of Data Nodes: The replica shards for the given primary shard meant to be present on different data nodes, which means if there are 3 data Nodes: DN1, DN2, DN3 then if primary shard is in DN1 then the replica shard should be present in DN2 and/or DN3. Hence no of replicas should be less than total no of Data Nodes.
- Capacity of each of the Data Nodes: Size of the shard cannot be more than the size of the data nodes hard disk and hence depending upon the expected size for the given index, no of primary shards should be defined.
- Recovering mechanism in case of failure: If the data on the given index has quick recovering mechanism then 1 replica should be enough.
- Performance requirement from the given index: As sharding helps in directing the client node to appropriate shard to improve the performance and hence depending upon the query parameter and size of the data belonging to that query parameter should be considered in defining the no of primary shards.
These are the ideal and basic guidelines to be followed, it should be optimized depending upon the actual use cases.

- 11
- 3
I have not tested this yet, but aws has a good articale about ES best practises. Look at Choosing Instance Types and Testing part.

- 31,309
- 66
- 224
- 364
Elastic.co recommends to:
[…] keep the number of shards per node below 20 per GB heap it has configured

- 1,316
- 13
- 33