4

The ElasticSearch documentation on the Percolate query recommends using separate indices for the query and the document being percolated:

Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:

  • Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.

  • Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.

At the bottom of the page here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html

I understand this in theory, but I'd like to know more about how necessary this is for a large index (say, 1 million registered queries).

The tradeoff in my case is that creating a separate index for the document is quite a bit of extra work to maintain, mainly because both indices need to stay "in sync". This is difficult to guarantee without transactions, so I'm wondering if the effort is worth it for the scale I need.

In general I'm interested in any advice regarding the design of the index/mapping so that it can be queried efficiently. Thanks!

Community
  • 1
  • 1
MattM
  • 1,159
  • 1
  • 13
  • 25

0 Answers0