3

We're using ElasticSearch completion suggester with the Standard Analyzer, but it seems like the text is not tokenized.

e.g.

Texts: "First Example", "Second Example"

Search: "Fi" returns "First Example"

While

Search: "Ex" doesn't return any result returns "First Example"

Guy Korland
  • 9,139
  • 14
  • 59
  • 106

3 Answers3

3

As the doc of Elastic about completion suggester: Completion Suggester

The completion suggester is a so-called prefix suggester.

So when you send a keyword, it will look for the prefix of your texts.

E.g:

Search: "Fi" => "First Example"

Search: "Sec" => "Second Example"

but if you give Elastic "Ex", it returns nothing because it cannot find a text which begins with "Ex".

You can try some others suggesters like: Term Suggester

Trong Lam Phan
  • 2,292
  • 3
  • 24
  • 51
1

A great work around is to tokenize the string yourself and put it in a separate tokens field. You can then use 2 suggestions in your suggest query to search both fields.

Example:

PUT /example
{
    "mappings": {
        "doc": {
            "properties": {
                "full": {
                    "type": "completion"
                },
                "tokens": {
                    "type": "completion"
                }
            }
        }
    }
}

POST /example/doc/_bulk
{ "index":{} }
{"full": {"input": "First Example"}, "tokens": {"input": ["First", "Example"]}}
{ "index":{} }
{"full": {"input": "Second Example"}, "tokens": {"input": ["Second", "Example"]}}

POST /example/_search
{
    "suggest": {
        "full-suggestion": {
            "prefix" : "Ex", 
            "completion" : { 
                "field" : "full",
                "fuzzy": true
            }
        },
        "token-suggestion": {
            "prefix": "Ex",
            "completion" : { 
                "field" : "tokens",
                "fuzzy": true
            }
        }
    }
}

Search result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "full-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": []
      }
    ],
    "token-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "Ikvk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "First Example"
              },
              "tokens": {
                "input": [
                  "First",
                  "Example"
                ]
              }
            }
          },
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "I0vk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "Second Example"
              },
              "tokens": {
                "input": [
                  "Second",
                  "Example"
                ]
              }
            }
          }
        ]
      }
    ]
  }
}
M.Vanderlee
  • 2,847
  • 2
  • 19
  • 16
  • The manual tokenization is very important. You need to generate shingles (word-grams) to avoid the search from returning 0 results as soon as you begin to type the second word in a sentence. If you had a 3 word input such as "First Example Code", you wouldn't be able to return any results for "Example co", without shingles of the entire phrase. – Silas Hansen Apr 08 '21 at 13:45
0

One approach to hack in the suggestions from every position of the string could be to shingle the string, take only the shingles with position 0, from every shingle take the last token.

PUT example
{
  "settings": {
    "index.max_shingle_diff": 10,
    "analysis": {
      "filter": {
        "after_last_space": {
          "type": "pattern_replace",
          "pattern": "(.* )",
          "replacement": ""
        },
        "preserve_only_first": {
          "type": "predicate_token_filter",
          "script": {
            "source": "token.position == 0"
          }
        },
        "big_shingling": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 10,
          "output_unigrams": true
        }
      },
      "analyzer": {
        "dark_magic": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "big_shingling",
            "preserve_only_first",
            "after_last_space"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "analyzer": "dark_magic",
        "search_analyzer": "standard"
      }
    }
  }
}

This hack works for short strings (up to 10 tokens in the example).