3

I am not experienced in ES (my background is more of relational databases) and I am trying to achieve the goal of having a search bar in my web application to search the entire content of it (or the content I will be willing to index in ES).

The architecture implemented is Jamstack with a gatsby application fetching content (sometimes at build time, sometimes at runtime) from a strapi application (headless cms). In the middle, I developed a microservice to write the documents created in the strapi application to the ES database. At this moment, there is only one index for all the documents, regardless the type.

My problem is, as the application grows and different types of documents are created (sometimes very different from one another, as example I can have an article (news) and a hospital) I am having hard time to correctly query the database as I have to define a lot of specific conditions when making the query (to cover all types of documents).

My solution to this is to keep only one index and break down the query in several ones and when the user hits the search button those queries are run and the results will be joined together before being presented OR break down the only index into several ones, one per document which leads me to another doubt, is it possible to query multiple indexes at once and define specific index fields in the query?

Which is the best approach? I hope I could make my self clear in this.

Thanks in advance.

Ferran Buireu
  • 28,630
  • 6
  • 39
  • 67

1 Answers1

2

According to the example you provided, where one type of document can be of type news and another type is hospital, it makes sense to create multiple indices(but you also need to tell, how many such different types you have). there are pros and cons with both the approach and once you know them, you can choose one based on your use-case.

Before I start listing out the pros/cons, the answer to your other question is that you can query multiple indices in a single search query using multi-search API.

Pros of having a single index

  1. less management overhead of multiple indices(this is why I asked how many such indices you may have in your application).
  2. More performant search queries as data are present in a single place.

Cons

  1. You are indexing different types of documents, so you will have to include a complex filter to get the data that you need.
  2. Relevance will not be good, as you have a mix of documents which impacts the IDF of similarity algo(BM25), and impacts the relevance.

Pros of having a different index

  1. It's better to separate the data based on their properties, for better relevant results.
  2. Your search queries will not be complex.
  3. If you have really huge data, it makes sense to break the data, to have the optimal shard size and better performance.

cons

  1. More management overhead.
  2. if you need to search in all indices, you have to implement multi-search and wait for all indices search result, which might be costly.
Amit
  • 30,756
  • 6
  • 57
  • 88
  • 1
    Thanks a lot for your answer Elasticsearch Ninja. In fact, this a starting project and I am not sure but I can predict that there will be, indeed, a lot of different type of documents. Those two were just examples but I believe I will have quite a significant number of different documents, like 20/30+. Do you think I definitely should go for the multi indices approach? Other than that, there will also be a lot of documents of each type, I can't predict a number right now but I believe there will be, just thinking of the fact there will be people publishing news every day or so. – João Furriel Nov 25 '20 at 09:07
  • @JoãoFurriel, if you have a significant number of docs in each category and its going to be just 20,30(not a large number wrt to ES), you should definitely go with multiple indices approach so that in the future it's more scalable and flexible :) – Amit Nov 25 '20 at 09:26
  • The 20/30 number are the number of document types, not the number of documents. The number of documents will be hundreds/thousands by each type. – João Furriel Nov 25 '20 at 09:29
  • @JoãoFurriel, Yeah I understood that in first place :), still having 20-30 indices in ES cluster is not a big deal, if you had this much less docs than would have advised single index approach :) – Amit Nov 25 '20 at 09:31
  • 1
    Oh, ok :) Ok, and assuming I will go for the multi index approach my question is, is the relevance of a document in an index calculated taking into account documents on other indicies? Using my example above, if I search for "doctor" and I have one index for the news and another for the hospitals, the rank of the results will take into account the documents with the word "doctor" present in documents of type "hospital" and "news" which live in different indices? – João Furriel Nov 25 '20 at 09:37
  • @JoãoFurriel, no relavance is calculated considering all the documents in a single index, so if you have `hospital` and `news` index and searching `doctor` in `hospital` index than other docs in `new` or any other index in your cluster will not be considered which is correct as well :) – Amit Nov 25 '20 at 09:40
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225081/discussion-between-elasticsearch-ninja-and-joao-furriel). – Amit Nov 25 '20 at 09:41