6

I am trying to implement an exact match search in elastic search. But I am not getting the required results. Here is the code to explain the issue I am facing and things I tried.

doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}

# Indexing the above docs

es.index(index="english",doc_type="sentences",id=1,body=doc1)

es.index(index="english",doc_type="sentences",id=2,body=doc2)

es.index(index="english",doc_type="sentences",id=3,body=doc3)

es.index(index="english",doc_type="sentences",id=4,body=doc4)

es.index(index="english",doc_type="sentences",id=5,body=doc5)

query 1

res = es.search(index="english",body={"from":0,"size":5,
                                  "query":
                                      {"match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},

                                          "explain":False})

query 2

 res = es.search(index="english",body={"from":0,"size":5,
                                  "query":{
                                    "bool":{
                                            "must":{
                                            "match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},
                                            "filter":{
                                                    "term":{
                                                            "sentence.word_count": 5}},

                                          }
                                            }
                                            })

So when I run query 1, I get doc2 as the top result, while I want doc1 to be the top result.

When I am trying to use filter for the same( to restrict the length of search to the length of query), as in query 2 , I am getting no result.

I will be really grateful if I can get any help on solving this. I want an exact match for the given query and not the match which contains that query.

Thanks

Gaurav Chawla
  • 1,473
  • 3
  • 14
  • 19
  • 1
    Why do you set slop to 3 in query 1 if you want exact phrase matching? – Val Sep 06 '18 at 15:01
  • By that I mean, I want the match to be with the same words, order can be different. – Gaurav Chawla Sep 06 '18 at 19:06
  • 1
    Then it's not "exact matching", you should update your question to make it clear ;-) – Val Sep 06 '18 at 20:03
  • removed slop ;-) – Gaurav Chawla Sep 06 '18 at 21:05
  • Maybe [this](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html) can help? – ozanmuyes Sep 14 '18 at 10:29
  • 2
    My guts tell me that your index has 5 primary shards and you don't have enough documents for the scores to be relevant. If you create an index with a single primary shard, your first query will return the document you expect. Promise ;-) – Val Sep 14 '18 at 11:08
  • @GauravChawla did you try what I suggest in my comment above? – Val Sep 19 '18 at 03:49
  • @Val Yes that's why I upvoted your comment. It worked. You kept your promise ;) I am just still figuring out how just 1 shard will affect the performance at scale. – Gaurav Chawla Sep 20 '18 at 04:38
  • You can still have multiple shards to scale, but the scoring will only be relevant if you have more documents than just a few. You might want to read this: https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch – Val Sep 20 '18 at 05:06
  • Thanks @Val. Your inputs have been really helpful. I am just considering other scenarios where elastic search should not output anything incase doc1 is not present (otherwise it will give doc2). In simple words it should output only exact match else not. Is there any way to achieve that? – Gaurav Chawla Sep 20 '18 at 08:51

3 Answers3

2

My guts tell me that your index has 5 primary shards and you don't have enough documents for the scores to be relevant. If you create an index with a single primary shard, your first query will return the document you expect. You can read more about the reason why this happens in the following article: https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch

One way to achieve what you want is by using the keyword type but with a normalizer to lowercase the data so it's easier to search for exact matches in a case insensitive way.

Create your index like this:

PUT english
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lc_normalizer": {
          "type": "custom",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "sentences": {
      "properties": {
        "sentence": {
          "type": "text",
          "fields": {
            "exact": {
              "type": "keyword",
              "normalizer": "lc_normalizer"
            }
          }
        }
      }
    }
  }
}

Then you can index your documents as usual.

PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...

Finally you can search for an exact phrase match, the query below will only return doc1

POST english/_search
{
  "query": {
    "match": {
      "sentence.exact": "today is a sunny day"
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
1

Try using a bool query

    PUT test_index/doc/1
    {"sentence": "Today is a sunny day"}

    PUT test_index/doc/2
    {"sentence": "Today is a sunny day but tomorrow it might rain"}

 -#terms query for exact match with keyword and multi match - phrase for other matches
    GET test_index/_search
    {
      "query": {
        "bool": {
          "should": [
            {
              "terms": {
                "sentence.keyword": [
                  "Today is a sunny day"
                ]
              }
            },
            {  
              "multi_match":{  
                "query":"Today is a sunny day",
                "type":"phrase",
                "fields":[  
                    "sentence"
                ]
              }
            }
          ]
        }
      }
    }

Another option use multi match for both with keyword match as first and boost of 5 and other matches with no boost:

PUT test_index/doc/1
{"sentence": "Today is a sunny day"}

PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}


GET test_index/_search
{  
  "query":{  
    "bool":{  
      "should":[  
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
              "sentence.keyword"
            ],
            "boost":5
          }
        },
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
                "sentence"
            ]
          }
        }
      ]
    }
  }
}
Polynomial Proton
  • 5,020
  • 20
  • 37
0

This Query will work -

{
    "query":{
        "match_phrase":{
            "sentence":{
                "query":"Today is a sunny day"
            }
        }
    },
    "size":5,
    "from":0,
    "explain":false
}
Archit Rastogi
  • 195
  • 1
  • 8
  • Thanks for the answer Archit, but it wont work if I change the doc2 to "Today is a sunny day but tomorrow it might rain". Hence updated the doc2. My bad, I should have made my question more clear. Hope it makes sense now. – Gaurav Chawla Sep 07 '18 at 07:07
  • if you want exact matching you can make it keyword instead of string. – Archit Rastogi Sep 08 '18 at 08:56