1

I am learning ElasticSearch and in their documentation it's written this line

Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.

Please someone explain me in layman term what does the 2nd sentence means.

Prabhjot
  • 4,496
  • 2
  • 18
  • 22

2 Answers2

2

As a preamble you might want to go through another thread on SO that explains horizontal vs vertical scaling.

Most of the time, an ES cluster is designed to grow horizontally, meaning that whenever your cluster starts to show some signs of weaknesses (slow queries, slow indexing, etc), all you need to do is add one or more nodes to your cluster and ES will spread the load on more hardware, and thus, lighten the burden on existing nodes. That's what horizontal scaling is all about and ES is perfectly designed for this given the way it partitions the indexes into shards that get assigned to the nodes in your cluster.

As you know, ES has no JOIN feature and they did it on purpose for the reason mentioned above (i.e. "prohibitively expensive"). There are four ways to model relationships in ES:

The link you referred to, which introduces the nested, has_parent and has_child queries, is about the second and third bullet point above. Nested and parent/child documents have been designed in such a way as to take advantage as much as possible from the index/shard partitioning model that ES supports.

When using a nested field (1-N relationship), each element inside of the nested array is just another hidden document under the hood and is stored in a shard somewhere in your cluster. When using a join field (1-N relationship), parent and child documents are also documents stored in your index within a shard located somewhere in your cluster. When your index grows (i.e. when you have more and more parent and child and/or nested data), you add nodes and the shards containing your documents will get spread within the cluster transparently. This means that wherever your documents are stored, you can retrieve them as well as their related documents without having to perform expensive joins.

Val
  • 207,596
  • 13
  • 358
  • 360
0

So you will get more information about scaling horizontal here

In Elasticsearch terms when you start two or more instances on ES in same network with same cluster configs then they will connect to each other and create a distributed network.So if you add one more computer or node and started one ES instance there and keep the cluster config same that node will automatically will get attached to the previous cluster and the data and the request load will be shared .When you make any request to ES may be its a read or write request each request can be processed parallel and you get the speed according to the no of node and shards in them of each index.

Get more information here