ElasticSearch completion suggester Standard Analyzer not working

Question

We're using ElasticSearch completion suggester with the Standard Analyzer, but it seems like the text is not tokenized.

e.g.

Texts: "First Example", "Second Example"

Search: "Fi" returns "First Example"

While

Search: "Ex" doesn't return any result returns "First Example"

score 3 · Answer 1 · answered Aug 29 '16 at 16:26

3

As the doc of Elastic about completion suggester: Completion Suggester

The completion suggester is a so-called prefix suggester.

So when you send a keyword, it will look for the prefix of your texts.

E.g:

Search: "Fi" => "First Example"

Search: "Sec" => "Second Example"

but if you give Elastic "Ex", it returns nothing because it cannot find a text which begins with "Ex".

You can try some others suggesters like: Term Suggester

answered Aug 29 '16 at 16:26

Trong Lam Phan

2,292
3
24
51

5

So why can you pick different analyzers? – Guy Korland Aug 29 '16 at 16:42
1

Yes, I am agreed with Guy Koraland's comment. Why we pick different analyzers then? – Jimmy Jan 17 '18 at 14:12

score 1 · Answer 2 · answered Jan 12 '18 at 19:44

A great work around is to tokenize the string yourself and put it in a separate tokens field. You can then use 2 suggestions in your suggest query to search both fields.

Example:

PUT /example
{
    "mappings": {
        "doc": {
            "properties": {
                "full": {
                    "type": "completion"
                },
                "tokens": {
                    "type": "completion"
                }
            }
        }
    }
}

POST /example/doc/_bulk
{ "index":{} }
{"full": {"input": "First Example"}, "tokens": {"input": ["First", "Example"]}}
{ "index":{} }
{"full": {"input": "Second Example"}, "tokens": {"input": ["Second", "Example"]}}

POST /example/_search
{
    "suggest": {
        "full-suggestion": {
            "prefix" : "Ex", 
            "completion" : { 
                "field" : "full",
                "fuzzy": true
            }
        },
        "token-suggestion": {
            "prefix": "Ex",
            "completion" : { 
                "field" : "tokens",
                "fuzzy": true
            }
        }
    }
}

Search result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "full-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": []
      }
    ],
    "token-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "Ikvk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "First Example"
              },
              "tokens": {
                "input": [
                  "First",
                  "Example"
                ]
              }
            }
          },
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "I0vk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "Second Example"
              },
              "tokens": {
                "input": [
                  "Second",
                  "Example"
                ]
              }
            }
          }
        ]
      }
    ]
  }
}

The manual tokenization is very important. You need to generate shingles (word-grams) to avoid the search from returning 0 results as soon as you begin to type the second word in a sentence. If you had a 3 word input such as "First Example Code", you wouldn't be able to return any results for "Example co", without shingles of the entire phrase. — Silas Hansen, Apr 08 '21 at 13:45

score 0 · Answer 3 · answered Sep 13 '22 at 10:34

One approach to hack in the suggestions from every position of the string could be to shingle the string, take only the shingles with position 0, from every shingle take the last token.

PUT example
{
  "settings": {
    "index.max_shingle_diff": 10,
    "analysis": {
      "filter": {
        "after_last_space": {
          "type": "pattern_replace",
          "pattern": "(.* )",
          "replacement": ""
        },
        "preserve_only_first": {
          "type": "predicate_token_filter",
          "script": {
            "source": "token.position == 0"
          }
        },
        "big_shingling": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 10,
          "output_unigrams": true
        }
      },
      "analyzer": {
        "dark_magic": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "big_shingling",
            "preserve_only_first",
            "after_last_space"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "analyzer": "dark_magic",
        "search_analyzer": "standard"
      }
    }
  }
}

This hack works for short strings (up to 10 tokens in the example).

ElasticSearch completion suggester Standard Analyzer not working

3 Answers3

Linked