7

I am using ElasticSearch and I am trying to implement match_phrase/string + fuzziness but it seems like it is impossible (not that much examples online, no such cases in the documentation).

What I need: phrase/string matching + fuzziness + slop based on every value of the field individually.

What I've tried so far (and I still don't have a solution I need):

query_string - it has fuzziness and slop included. However, it gathers a string through all of the values of the field through one document.

match_phrase - it has slop included, but there is no fuzziness. What is good - it looks for a phrase match in at least one of the values of the field, not gathers the string through all the values of the document's field.

What I need:

Anybody has experience on phrase matching including fuzziness on ElasticSearch?

Thanks in advance.

BC1554
  • 137
  • 2
  • 11
  • Sorry to ask, but did you try fuzzy? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html . Did you take a look about analyzer too? – LeBigCat Dec 06 '18 at 15:47
  • Hi, @LeBigCat , – BC1554 Dec 06 '18 at 16:13
  • Such query would just bring back all the documents from the index. So not helping that much. – BC1554 Dec 06 '18 at 16:23
  • This should help you https://stackoverflow.com/questions/53541053/elasticsearch-match-phrase-query-and-fuzzy-query-can-both-be-used-in-conju/53542938#53542938 and this one too https://stackoverflow.com/questions/53407151/fuzziness-in-bool-query-with-multimatch-elasticsearch/53409671#53409671 – Kamal Kunjapur Dec 06 '18 at 20:31
  • Hi @Kamal, the second link you provided "collects" a phrase going through all values of the field, so that is no go for me. I Just started testing first link, so will update on it soon. And thanks for support! – BC1554 Dec 07 '18 at 07:10
  • @BC1554 Sure, but I'm assuming you went through the section `Fuzzy without phrase` in the second link. There is a section called `Fuzzy with phrase` in the second link which is same as the first link. Except that it has an example of using multiple fields. – Kamal Kunjapur Dec 07 '18 at 07:19
  • @Kamal I am not sure if span works as it should in the first link, cause at least in my current dataset I don't spot span working as it should. If span works (when "in_order": true) , I would expect same results while doing such switch: "fuzzy": { "education.title": "Stanford" } .... "fuzzy": { "education.title": "University" } should produce same results as: "fuzzy": {"education.title": "University"} .... "fuzzy": { "education.title": "Stanford" } Please correct me if I am wrong. P.S.: isnt't there a conflict? "in_order": true and span = 100? – BC1554 Dec 07 '18 at 08:01
  • Set `in_order:false`. With `true` it won't find `University Stanford` with the first query you've mentioned in comment. – Kamal Kunjapur Dec 07 '18 at 08:06
  • Nice, now it works. thanks a lot. Is it possible to manage fuzziness? @Kamal – BC1554 Dec 07 '18 at 09:01
  • @BC1554 Please check the answer I've posted as how to manage the fuzziness. That should be it. Feel free to upvote and accept the answer ;-) – Kamal Kunjapur Dec 07 '18 at 09:20

1 Answers1

5

You can make use of Span Queries for this as I've mentioned in the links in the comment section of the question.

What you further looking for, is a way to control fuzziness using Span Queries. I've taken an example from this SOF answer and rewrote the query as you wanted to manage fuzziness.

Query

POST <your_index_name>
{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "span_near":{  
                  "clauses":[  
                     {  
                        "span_multi":{  
                           "match":{  
                              "fuzzy":{  
                                 "name":{  
                                    "value":"champions",
                                    "fuzziness":2
                                 }
                              }
                           }
                        }
                     },
                     {  
                        "span_multi":{  
                           "match":{  
                              "fuzzy":{  
                                 "name":{  
                                    "value":"league",
                                    "fuzziness":2
                                 }
                              }
                           }
                        }
                     }
                  ],
                  "slop":0,
                  "in_order":false
               }
            }
         ]
      }
   }
}

Hope this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32
  • @Karmal, hi. Now I bumped into other problem - how can I choose only the values of the field which fit fuzzy query? Let's say there are different names in the field university like: education : [MIT, Stanford University, Michingan university] but I want to select only stanford university. – BC1554 Dec 11 '18 at 05:49
  • Let's say I can do aggregation on each fuzzy query, which would return ALL counts and all names of universities from field education. What I need - to get aggregations only of exact values which match fuzzy query. Let's say if I do a fuzzy query for Stanford University and a field education holds values of [MIT, Stanfordddd University, Michigan University], I would a query to bring me back only a value of 'Stanfordddd University', not all three of them. Thanks! – BC1554 Dec 11 '18 at 05:52
  • hey @BC1554, the only way I can think of is to write a script for this. Could you post this as new question, it may be a while before I come up with script, posting it as new question will allow other members to dive into this. – Kamal Kunjapur Dec 11 '18 at 06:17
  • I can post it only in 90 minutes, as I just made a another post couple minutes ago. – BC1554 Dec 11 '18 at 06:27
  • No probs, but do post it when you can, I'm sure people will have alternative solutions. Meanwhile I'll work on this script and update you as soon as I can. – Kamal Kunjapur Dec 11 '18 at 06:28
  • please find a problem described here: https://stackoverflow.com/questions/53720080/elastic-search-exact-field-value-retrieval – BC1554 Dec 11 '18 at 08:27
  • Hi @Kamal, I have one more question, and seems like you might know the answer: https://stackoverflow.com/questions/53728135/elastic-search-loop-over-values-of-two-indices – BC1554 Dec 12 '18 at 04:15