114

I am new to elastic search and I am confused between must and filter. I want to perform an and operation between my terms, so I did this

POST /xyz/_search

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "city": "city1"
                    }
                },
                {
                    "term": {
                        "saleType": "sale_type1"
                    }
                }
            ]
        }
    }
}

which gave me the required results matching both the terms, and on using filter like this

POST /xyz/_search

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "city": "city1"
                    }
                }
            ],
            "filter": {
                "term": {
                    "saleType": "sale_type1"
                }
            }
        }
    }
}

I get the same result, so when should I use must and when should I use filter? What is the difference?

Custodio
  • 8,594
  • 15
  • 80
  • 115
Krash
  • 2,085
  • 3
  • 13
  • 36

2 Answers2

120

must contributes to the score. In filter, the score of the query is ignored.

In both must and filter, the clause(query) must appear in matching documents. This is the reason for getting same results.

You may check this link

Score

The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

A query clause generates a _score for each document.

To know how score is calculated, refer this link

Community
  • 1
  • 1
Vijay
  • 4,694
  • 1
  • 30
  • 38
  • 41
    doesnt really answer when the must vs filter will be used. As in when is having a score important for anything>? – intiha Jan 09 '19 at 08:11
  • 33
    @intiha Use `filter` for faster searches, because there is no score to compute and no ranking to be done. Scoring becomes important when you care about say, the relevance due to no. of occurrences of your search term, the length of your matched document, or if you want to add `boost` to your query to uprank matching documents. – Cardin Apr 01 '19 at 02:08
  • 2
    @intiha Score is not important if all your queries are boolean. Clauses used in filters may be cached: "Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching." Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html – Paul Apr 02 '19 at 21:28
  • I think it also matters when you have a compound query where the score of inner query is used to calculate overall score. Same for `must_not`. – moon Apr 01 '21 at 15:48
  • Just adding to @Cardin comment. I did the same query on a relatively small dataset (2k entries), the first time using 'must' and the second using a filter. Using 'must', it took 20ms, while using filters it took 6ms. Avoid using 'must' when all you need is a 'filter'. – Tiago Duque Nov 07 '22 at 15:19
  • 1
    @Cardin Your comment should be the best answer to this question. Try to write it out as an answer – Yahya Jan 07 '23 at 14:48
6

must returns a score for every matching document. This score helps you rank the matching documents, and compare the relative relevance between documents (using the magnitude of the score of each document).

With this, one can say, Doc 1 is 3 times more relevant than Doc 2. Or that Doc 1 to 7 are of much higher relevancy than Doc 8+ onwards.

For how the relative score is determined, you can refer to the references below.
Briefly, it is related to the number of term occurrences in the document, the document length, and the average number of term occurrences in your database index.


filter doesn't return a score. All one can say is, all matching documents are of relevance. But it won't help in evaluating if one is more relevant than the other. You can think of filter as a must with only 2 scores: zero or non-zero, and where all zero-scored documents are dropped.

filter is helpful if you just want to whitelist/blacklist for e.g., all documents belonging to the topic "pets".


In summary, there are 3 points that will help you in deciding when to use what:

  1. must is your only choice when comparing/ranking documents by relevance
  2. filter if you don't care about scores/ranks
  3. filter is a lot faster because Elasticsearch doesn't need to compute a score

References:

Cardin
  • 5,148
  • 5
  • 36
  • 37