1

I am using Elasticsearch 7.2.0 and i want to create search suggestion.

For example i have these 3 movies titles:

Avengers: Infinity War
Avengers: Infinity War Part 2
Avengers: Age of Ultron

When i type "aven" should return suggestion like:

avengers
avengers infinity
avengers age

When i type "avengers inf"

avengers infinity war
avengers infinity war part 2

after lots of elasticsearch tutorials i have done this:

Check Cluster

PUT movies
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {},
        "analyzer": {
          "keyword_analyzer": {
            "filter": [
              "lowercase",
              "asciifolding",
              "trim"
            ],
            "char_filter": [],
            "type": "custom",
            "tokenizer": "keyword"
          },
          "edge_ngram_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "edge_ngram_tokenizer"
          },
          "edge_ngram_search_analyzer": {
            "tokenizer": "lowercase"
          },
          "completion_analyzer": {
            "tokenizer": "keyword",
            "filter": "lowercase"
          }
        },
        "tokenizer": {
          "edge_ngram_tokenizer": {
            "type": "edge_ngram",
            "min_gram": 2,
            "max_gram": 5,
            "token_chars": [
              "letter"
            ]
          }
        }
      }
    }
  },
  "mappings": {

      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keywordstring": {
              "type": "text",
              "analyzer": "keyword_analyzer"
            },
            "edgengram": {
              "type": "text",
              "analyzer": "edge_ngram_analyzer",
              "search_analyzer": "edge_ngram_search_analyzer"
            },
            "completion": {
              "type": "completion"
            }
          },
          "analyzer": "standard"
        },
        "completion_terms": {
          "type": "text",
          "fielddata": true,
          "analyzer": "completion_analyzer"
        }
      }

  }
}

whth the folowing docs:

POST movies/_doc/1
{
  "name": "Spider-Man: Homecoming",
  "completion_terms": [
    "spider-man",
    "homecomming"
  ]
}

POST movies/_doc/2
{
  "name": "Ant-man and the Wasp",
  "completion_terms": [
    "ant-man",
    "and",
    "the",
    "wasp"
  ]
}

POST movies/_doc/3
{
  "name": "Avengers: Infinity War Part 2",
  "completion_terms": [
    "avangers",
    "infinity",
    "war",
    "part",
    "2"
  ]
}

POST movies/_doc/4
{
  "name": "Captain Marvel",
  "completion_terms": [
    "captain",
    "marvel"
  ]
}

POST movies/_doc/5
{
  "name": "Black Panther",
  "completion_terms": [
    "black",
    "panther"
  ]
}

POST movies/_doc/6
{
  "name": "Avengers: Infinity War",
  "completion_terms": [
    "avangers",
    "infinity",
    "war"
  ]
}

POST movies/_doc/7
{
  "name": "Thor: Ragnarok",
  "completion_terms": [
    "thor",
    "ragnarok"
  ]
}

POST movies/_doc/8
{
  "name": "Guardians of the Galaxy Vol 2",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy",
    "vol",
    "2"
  ]
}

POST movies/_doc/9
{
  "name": "Doctor Strange",
  "completion_terms": [
    "doctor",
    "strange"
  ]
}

POST movies/_doc/10
{
  "name": "Captain America: Civil War",
  "completion_terms": [
    "captain",
    "america",
    "civil",
    "war"
  ]
}

POST movies/_doc/11
{
  "name": "Ant-Man",
  "completion_terms": [
    "ant-man"
  ]
}

POST movies/_doc/12
{
  "name": "Avengers: Age of Ultron",
  "completion_terms": [
    "avangers",
    "age",
    "of",
    "ultron"
  ]
}

POST movies/_doc/13
{
  "name": "Guardians of the Galaxy",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy"
  ]
}

POST movies/_doc/14
{
  "name": "Captain America: The Winter Soldier",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "winter",
    "solider"
  ]
}

POST movies/_doc/15
{
  "name": "Thor: The Dark World",
  "completion_terms": [
    "thor",
    "the",
    "dark",
    "world"
  ]
}

POST movies/_doc/16
{
  "name": "Iron Man 3",
  "completion_terms": [
    "iron",
    "man",
    "3"
  ]
}

POST movies/_doc/17
{
  "name": "Marvel’s The Avengers",
  "completion_terms": [
    "marvels",
    "the",
    "avangers"
  ]
}

POST movies/_doc/18
{
  "name": "Captain America: The First Avenger",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "first",
    "avanger"
  ]
}

POST movies/_doc/19
{
  "name": "Thor",
  "completion_terms": [
    "thor"
  ]
}

POST movies/_doc/20
{
  "name": "Iron Man 2",
  "completion_terms": [
    "iron",
    "man",
    "2"
  ]
}

POST movies/_doc/21
{
  "name": "The Incredible Hulk",
  "completion_terms": [
    "the",
    "incredible",
    "hulk"
  ]
}

POST movies/_doc/22
{
  "name": "Iron Man",
  "completion_terms": [
    "iron",
    "man"
  ]
}

and the query

POST movies/_search
{
  "suggest": {
    "movie-suggest-fuzzy": {
        "prefix": "avan",
        "completion": {
          "field": "name.completion",
          "fuzzy": {
            "fuzziness": 1
          }
      }
    }
  }
}

My query return full title not pieces.

Dobra Adrian
  • 163
  • 1
  • 6

1 Answers1

0

It's a great question and shows you have done a lot of research to get it to work, but you are unnecessary making it complex(by trying to handle it completely in ES), I was having exactly the same use-case and solved it using the combination of application side logic with ES.

What you actually need is match query on (n-1) terms and prefix query on the nth search term as you mentioned, in case of aven as its the first and nth term, prefix query would be on it and in case of avengers inf search term, avengers would be on match query and prefix would be on inf term.

I just indexed the documents given by you and tried both the search terms mentioned and it works:

Index creation

{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            }
        }
    }
}

Index 3 docs

{
  "name" : "Avengers: Age of Ultron"
},
{
  "name" : "Avengers: Infinity War Part 2"
},
{
  "name" : "Avengers: Infinity War"
}

Search query

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {   --> Note match queries on (n-1) terms
                        "name": "avengers"
                    }
                },
                {
                    "prefix": {  --> Prefix query on nth term
                        "name": "ag"
                    }
                }
            ]
        }
    }
}

Basically in your application code, you need to split the search terms based on whitespace and then construct the bool query with match clause on (n-1) terms and prefix query on the nth term.

Note you don't even need to use the edge n-gram analyzer and other complex things while indexing, which would save lot of spaces in your index, but you might wanna put a character limit on prefix query as it might be costly when searching in millions of docs as its not a token to token match as its there in match query.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • I have done somthing, check out this https://stackoverflow.com/questions/59693038/elasticsearch-suggestions-using-shingle Is there a way to optimize this? – Dobra Adrian Jan 13 '20 at 07:02
  • @DobraAdrian did you get a chance to try out my answer and let me know if you have any questions? – Amit Jan 18 '20 at 10:54