As a preamble you might want to go through another thread on SO that explains horizontal vs vertical scaling.
Most of the time, an ES cluster is designed to grow horizontally, meaning that whenever your cluster starts to show some signs of weaknesses (slow queries, slow indexing, etc), all you need to do is add one or more nodes to your cluster and ES will spread the load on more hardware, and thus, lighten the burden on existing nodes. That's what horizontal scaling is all about and ES is perfectly designed for this given the way it partitions the indexes into shards that get assigned to the nodes in your cluster.
As you know, ES has no JOIN feature and they did it on purpose for the reason mentioned above (i.e. "prohibitively expensive"). There are four ways to model relationships in ES:
The link you referred to, which introduces the nested
, has_parent
and has_child
queries, is about the second and third bullet point above. Nested and parent/child documents have been designed in such a way as to take advantage as much as possible from the index/shard partitioning model that ES supports.
When using a nested
field (1-N relationship), each element inside of the nested
array is just another hidden document under the hood and is stored in a shard somewhere in your cluster. When using a join
field (1-N relationship), parent and child documents are also documents stored in your index within a shard located somewhere in your cluster. When your index grows (i.e. when you have more and more parent and child and/or nested data), you add nodes and the shards containing your documents will get spread within the cluster transparently. This means that wherever your documents are stored, you can retrieve them as well as their related documents without having to perform expensive joins.