2

TL;DR

Is it possible to have Elasticsearch return the matched input-shingle alongside the matched document in a fuzzed query?

Example:

Lets say I have a shingle:

"fulltext_shingle_filter":{
  "type": "shingle",
  "min_shingle_size": 2,
  "max_shingle_size": 3,
  "output_unigrams": false
}

And that shingle is used in a custom search-analyzer:

"fulltext_shingle":{
  "type": "custom",
  "tokenizer": "standard",
  "filter":["fulltext_shingle_filter"]
}

The index is analyzed as keywords like so:

"whitelist_keyword": {
  "type": "custom",
  "tokenizer": "keyword"
}

with documents looking something like this:

{
"_source": {
  "names": [
    "John Smith",
    "Smith, John"
  ]
},
{
"_source": {
  "names": [
    "Mr Wayne"
  ]
}

And we query like this:

POST /someindex/_search
{
  "query": {
    "match": {
    "names": {
      "query": "Hi, my Name is John Smit, I like toast.",
      "analyzer": "fulltext_shingle",
      "fuzziness": 1
      }
    }
  }
}

This would split the query using our fulltext_shingle-analyzer and apply a fuzziness of 1 to, among other, the shingle "John Smit". Elasticsearch then returns the document containing "John Smith" as the Levenshtein-Distance is equal to 1.

Now, is it possible to have elasticsearch return the input-shingle used before the fuzzing i.e. "John Smit" alongside the matched document?

The only thing I could think of was to essentially reverse the query, i.e. index the query-document with shingles enabled and then query for the original output ("John Smith") with the same fuzziness. But that seems like a terribly error-prone and resource-wasting hastle to me.

MoorzTech
  • 380
  • 4
  • 17

0 Answers0