0

I have an Elasticsearch index for the Wikipedia corpus with documents of four types (paragraph, infobox, list, and table). Each document has the following fields: page_title, section_path, and document_text.

I'm querying this corpus with a question, "what country is Narora located in?" scoped within a single page ("List of nuclear power stations") and document type ("table") as the following, which should return the table documents found in this Wikipedia page.

{
    "size": 100,
    "_source": [
        "id",
        "document_text",
        "section_path",
        "page_title",
    ],
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "document_text": "what country is Narora located in?"
                    }
                },
                {
                    "match": {
                        "section_path": "what country is Narora located in?"
                    }
                },
                {
                    "match": {
                        "page_title": "what country is Narora located in?"
                    }
                }
            ],
            "must": [
                {
                    "match": {
                        "page_title": "List of nuclear power stations"
                    }
                },
                {
                    "match": {
                        "paragraph_type": "table"
                    }
                }
            ]
        }
    }
}

This query returns empty results. When I remove one of the two should clauses, either section_path or page_title, I get multiple table documents as a result, including the "In service" one here, which has a mention of Narora in its document_text field.

Now, granted that the page_title ("List of nuclear power stations") and section_path ("In service", "Under construction" ...) don't overlap with the question. But I'm surprised by this behavior as the should clause is only supposed to affect the scoring, and not what matches (source). So adding a should clause shouldn't cause Elasticsearch to return an empty result.

Any thoughts on what could be going on here? Is there any way to force Elasticsearch to ignore the should clause/s if there's no match and still return ALL the documents that match the rest of the must criteria?

Harsh Trivedi
  • 1,594
  • 14
  • 27
  • What is the version of Elasticsearch? – hkulekci Dec 26 '22 at 19:19
  • The version is `7.10.2`. – Harsh Trivedi Dec 26 '22 at 19:21
  • Could you try to use `minimum_should_match: 0` ? – hkulekci Dec 26 '22 at 19:23
  • The document you shared in StackOverflow is so old, from 2015, and also Elastic did some changes in that version related to minimum_should_match feature silently. – hkulekci Dec 26 '22 at 19:25
  • I could not try that version but I think the problem related with default value of `minimum_should_match` – hkulekci Dec 26 '22 at 19:28
  • Thanks, I did try that but it had no effect. While trying to figure out why it didn't have an effect, I found [this](https://github.com/elastic/elasticsearch/issues/52735) and [this](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html) which say that value = 0 is just ignored (granted though that the doc isn't for the same version). – Harsh Trivedi Dec 26 '22 at 19:31
  • 1
    Also, there is another question [here](https://discuss.elastic.co/t/when-does-minimum-should-match-change-default-value/286234) related to the same topic. – hkulekci Dec 26 '22 at 19:35
  • I think there was something broken and they changed the mentality of this field. – hkulekci Dec 26 '22 at 19:36
  • Maybe, we can try to restructure the query. Shortly, we are trying to ignore all shoulds for filtering the data, but also, we want to get some extra scores if these shoulds match, right? – hkulekci Dec 26 '22 at 19:40
  • Thanks hkulekci, for sharing the link! I'm not able to find a solution from it, but it's certainly helpful to know. ... Yes, what you describe is what I want. Can you suggest how to restructure the query? – Harsh Trivedi Dec 26 '22 at 19:47

0 Answers0