elasticsearch ngram slow when large numbers of documents match

Question

I am implementing a search as you type feature following this example: Edge NGram with phrase matching.

I feel like the query time is related to the number of matching documents even though I am requesting only the top 5 documents.

My index has 320 million documents. When my query is "l" 6 million documents match the query and it takes 22 ms to run the query. However when my query is "a" 133 million documents match the query and it takes 400 ms. Again, I am asking for only the top 5 documents.

See below for my index definition and query.

I am trying to make make all my queries less than 100 ms. How do I achieve this? What am I missing?

Here is my index definition:

`

PUT /ss
{
    "settings": {
        "analysis": {
            "filter": {
                "english_poss_stemmer": {
                    "type": "stemmer",
                    "name": "possessive_english"
                },
                "edge_ngram": {
                    "type": "edgeNGram",
                    "min_gram": "1",
                    "max_gram": "25",
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            },
            "analyzer": {
                "edge_ngram_analyzer": {
                    "filter": [
                        "lowercase",
                        "english_poss_stemmer",
                        "edge_ngram"
                    ],
                    "tokenizer": "standard"
                },
                "my_standard": {
                    "filter": [
                        "lowercase",
                        "english_poss_stemmer"
                    ],
                    "tokenizer": "standard"
                }
            }
        }
    },
    "mappings": {
        "ss": {
            "_all": {
                "enabled": false
            },
            "properties": {
                "name": {
                    "search_analyzer": "my_standard",
                    "analyzer": "edge_ngram_analyzer",
                    "type": "text"
                },
                "type": {
                    "search_analyzer": "keyword",
                    "analyzer": "keyword",
                    "type": "text"
                },
                "tax_id": {
                    "search_analyzer": "keyword",
                    "analyzer": "keyword",
                    "type": "text"
                }
            }
        }
    }
}

`

Here is my query:

GET /ss/_search
    {
            "from": 0,
            "size": 5,
            "query": {
                "bool": {
                    "must": {
                        "match_all": {}
                    },
                    "filter": {
                        "match_phrase": {
                            "name": "a"
                        }
                    }
                }
            }
        }

score 0 · Answer 1 · edited Oct 22 '18 at 20:03

0

I don't see any points in gram 1, and I think it makes sense to limit your grams to lets say 6.

You have:

"min_gram": "1"
"max_gram": "25"

Better to have:

"min_gram": "2"
"max_gram": "6"

edited Oct 22 '18 at 20:03

scopchanov

7,966
10
40
68

answered Oct 22 '18 at 19:43

Yuri Steps

1
1

elasticsearch ngram slow when large numbers of documents match

1 Answers1