2

I am exploring deepset haystack and found it very interesting for multiple use cases like a chatbot, search engine, document search, etc

But have not found any reference where I can create multiple indexes for different documents and search based on indexes. I thought of using meta tags for conditional search(on a particular area) by tagging the documents first and then using the params parameter of query API but the same doesn't seem to work and throws an error(I used its vanilla docker-compose based setup)

enter image description here

dmigo
  • 2,849
  • 4
  • 41
  • 62
Varun
  • 5,001
  • 12
  • 54
  • 85

1 Answers1

2

You can use multiple indices in the same document store if you want to support multiple use cases, indeed. The write_documents method of the document store has a parameter index so that you can store documents for your different use cases in different indices. In the same way, you can pass an index parameter to the query method.

As you expected, there is an alternative solution that uses the meta field of documents. However, the format needs to be slightly different. Your query needs to have the following format:

{"query": "What's the capital town?", "params": {"filters": {"name": "75_Algeria75.txt"}}}

and your documents need to have the following format:

{'text': 'Algeria is...', 'meta':{'name': "75_Algeria75.txt"}}
Julian Risch
  • 216
  • 1
  • 4
  • thanks for the answer, I too figured it out later but still failing to understand how to filter on score. – Varun Feb 04 '22 at 04:08
  • Also in case of meta filter it takes significantly same amount of time to filter as it does without filter(as per the logs it did try to process other documents too). Shouldnt it be relatively lower since it should search on required docs only... – Varun Feb 04 '22 at 04:24
  • In most of the cases the retrieval should be faster when filters are applied, yes. The document retrieval filtering in Haystack can be applied only to the meta data of documents. I am not sure what you want to achieve with filtering by score. – Julian Risch Feb 04 '22 at 15:54
  • how can we add `top_k_retriever` to the query – Varun Jul 07 '22 at 12:57
  • @Varun `params={"Retriever": {"top_k": 5}}` in the `pipeline.run()` method is the way to set a `top_k` for the retriever node in the latest Haystack version. – Julian Risch Jul 08 '22 at 13:16