Why not allow replica shards in the same node that holds primary shard to increase performance/throughput?

Question

I understand that replica shards are used for two main purposes in Elasticsearch:

Providing high availability (I.e. backup)
Improving throughput by enabling running search queries parallelly on multi-core CPUs

Elasticsearch does not allow having replica shards on the same node that holds the primary shard, the rationale is that replicas are used for backup which would be meaningless if they're stored on the same node as the primary shard. I get that.

But, in my case, I have a cluster with a single node and would like to add a replica to the node to improve the throughput and I don't mind the fact that I still have a single point of failure (I have the original data stored somewhere else). I only have a single machine to work with. Why can't I add replica shards for performance reasons only while disregarding the backup aspects?

You can set shards to 5 or etc based on your data (default value is 1 in latest version) and set replica to 0. — Sagar Patel, Jan 11 '22 at 12:18
@SagarPatel, thanks for the comment! makes sense, but what if the query ends up on the same shard? Wouldn't it still be better to have a complete replica of a single shard? — Omri Grinshpan, Jan 11 '22 at 12:28
as per the elasticsearch you can not assign the replica of shards on same node where shard is allocated. If you have only one node and you set shards and replica both as 1 then your cluster status will be always yellow. — Sagar Patel, Jan 12 '22 at 05:05

score 0 · Answer 1 · answered Jan 11 '22 at 14:01

0

ElasticSearch can execute concurrent requests on a shard. See this thread

the processing of a query is single threaded against each shard. Multiple queries can however be run concurrently against the same shard, so assuming you have more than one concurrent query, you can still use multiple cores.

So adding a replicas in the same node will just consume disk space. The throughput gain of replicas is that the data is distributed on mutiple node allowing all cpus of those node to be used to process your query.

answered Jan 11 '22 at 14:01

Pierre Mallet

7,053
2
19
30

I saw an example in a Udemy course where a single node had 2 replicas (of the same replica group) saying that "replica shards can be queried at the same time" which enables parallelization. So in that case we can have three queries running at the same time. Is this simply wrong? As far as I understand by your answer, a single replica would have the same effect as having 2 or even more, as long as we are talking about the same node. – Omri Grinshpan Jan 11 '22 at 15:01
@OmriGrinshpan yes I think the course assertion is somehow wrong. The parrallelisation depends on the number of cpus available and the thread pool configuration. Adding more replicas just ensure you have more nodes/cpus available. – Pierre Mallet Jan 11 '22 at 15:32
So, this means that having a single shard per index on a single node will have the same performance as 5 shards per index? Or are there other performance aspects I'm disregarding here since I see a lot of optimization questions about "how many shards per index are optimal". – Omri Grinshpan Jan 12 '22 at 10:17
I also found this quote from another question: ["Having multiple shards does help taking advantage of parallel processing on a single machine"](https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch) – Omri Grinshpan Jan 12 '22 at 10:23
No on a single node, have one or five shards is not the same. Number of primary shards matter, adding replicas shards on the same node would just be waste ressources. – Pierre Mallet Jan 12 '22 at 13:03
But if a single shard supports multi-threading, why would multiple shards increase performance? – Omri Grinshpan Jan 12 '22 at 16:42
Those shards will have fewer data, so posting list, terms dictionnary will be smaller and so on. So each shards will maybe respond faster. But there will be an coordination overhead to merge results from the differents shards. I suggest you read this post [https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster] to get more intel about shards – Pierre Mallet Jan 12 '22 at 16:53

Why not allow replica shards in the same node that holds primary shard to increase performance/throughput?

1 Answers1