244

I can't see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Jonesie
  • 6,997
  • 10
  • 48
  • 66
  • 64
    Official documentation is not very clear in fact – finiteautomata Jan 09 '14 at 18:01
  • 2
    Looks like there are appeared a page with more advanced explanation: https://www.elastic.co/guide/en/elasticsearch/guide/master/_queries_and_filters.html – Dmitry Polushkin May 19 '15 at 11:45
  • 6
    Worth noting that [queries and filters will be merged](https://www.elastic.co/guide/en/elasticsearch/reference/2.0/_query_dsl_changes.html) in ES 2.0, hence most of what's been said and written for queries vs filters will not apply anymore. Also check the [official blog post](https://www.elastic.co/blog/better-query-execution-coming-elasticsearch-2-0) announcing this change. – Val Oct 26 '15 at 06:06

8 Answers8

236

The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.

javanna
  • 59,145
  • 14
  • 144
  • 125
  • 26
    Right so, if the user is doing a google type search then I would use a query? If they are selecting possible value from a drop down (eg, invoice count > 50) then this would be a filter? – Jonesie Jan 30 '13 at 20:25
  • 4
    Yep, that's exactly right. Any time you need to restrict the entire set of documents by some metric, that's usually a case that a filter is appropriate. So maybe by age, length, size, etc etc – Zach Jan 30 '13 at 20:44
  • My solution uses filters and queries in the same request and it is super fast on the test database. We will soon get the live data in there to see how fast it really is. – Jonesie Mar 07 '13 at 05:59
  • @Zach To be absolutely clear, in a multi-tenant system -with permissions for users within a tenant-, it sounds like the tenant/authentication information would be a filter added to every query (i.e. a Filtered Query). Right? – Scott Willeke Aug 19 '13 at 19:46
  • 4
    @activescott Yep, that's what I would do. You can also set up filtered aliases so that "user aliases" always apply the appropriate filter. Makes administration easier and doesn't require code changes to update queries, extra cruft in your query, etc. – Zach Aug 20 '13 at 15:57
  • We use 'function_score' where you can set a query or a filter.We only set a filter. In the functions part you can also define filters which influence the score of youre results. – Stillmatic1985 Mar 21 '14 at 13:31
119

This is what official documentation says:

As a general rule, filters should be used instead of queries:

  • for binary yes/no searches
  • for queries on exact values

As a general rule, queries should be used instead of filters:

  • for full text search
  • where the result depends on a relevance score
igo
  • 6,359
  • 6
  • 42
  • 51
  • when I want to delete document, should I use a filter if possible ? I don't want it to be cached – Rytek Dec 03 '14 at 13:26
  • when deleting a doc, you do not require any score, nor do you need to do a full text search. So this would be a filter than, as you just need to make a delete/not delete decision. [filter-query-context](https://www.elastic.co/guide/en/elasticsearch/reference/7.3/query-filter-context.html) – nonNumericalFloat Jan 29 '20 at 10:51
27

An example (try it yourself)

Say index myindex contains three documents:

curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype  -d '{ "msg": "Hi Stack Overflow!" }'

Query: How well a document matches the query

Query hello sam (using keyword must)

curl localhost:9200/myindex/_search?pretty  -d '
{
  "query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'

Document "Hello world! I am Sam." is assigned a higher score than "Hello world!", because the former matches both words in the query. Documents are scored.

"hits" : [
   ...
     "_score" : 0.74487394,
     "_source" : {
       "name" : "Hello world! I am Sam."
     }
   ...
     "_score" : 0.22108285,
     "_source" : {
       "name" : "Hello world!"
     }
   ...

Filter: Whether a document matches the query

Filter hello sam (using keyword filter)

curl localhost:9200/myindex/_search?pretty  -d '
{
  "query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'

Documents that contain either hello or sam are returned. Documents are NOT scored.

"hits" : [
   ...
     "_score" : 0.0,
     "_source" : {
       "name" : "Hello world!"
     }
   ...
     "_score" : 0.0,
     "_source" : {
       "name" : "Hello world! I am Sam."
     }
   ...

Unless you need full text search or scoring, filters are preferred because frequently used filters will be cached automatically by Elasticsearch, to speed up performance. See Elasticsearch: Query and filter context.

kgf3JfUtW
  • 13,702
  • 10
  • 57
  • 80
20

Filters -> Does this document match? a binary yes or no answer

Queries -> Does this document match? How well does it match? uses scoring

Emmanuel Osimosu
  • 5,625
  • 2
  • 38
  • 39
13

Few more addition to the same. A filter is applied first and then the query is processed over its results. To store the binary true/false match per document , something called a bitSet Array is used. This BitSet array is in memory and this would be used from second time the filter is queried. This way , using bitset array data-structure , we are able to utilize the cached result.

One more point to note here , the filter cache is created only when the request is executed hence only from the second hit , we actually get the advantage of caching.

But then you can use warmer API , to outgrow this. When you register a query with filter against a warmer API , it will make sure that this is executed against a new segment whenever it comes live. Hence we will get consistent speed from the first execution itself.

Vineeth Mohan
  • 18,633
  • 8
  • 63
  • 77
  • 1
    Interesting! I didn't realise filters happen before queries. The caching of filters makes more sense now. – Constant Meiring Apr 15 '15 at 19:46
  • 1
    Not always. The basic and primary difference between filtered and constant score query. Constant score always execute query first and then applies filter over it. Even filtered query have settings by which query can execute before filters. – piyushGoyal Apr 30 '15 at 17:24
11

Basically, a query is used when you want to perform a search on your documents with scoring. And filters are used to narrow down the set of results obtained by using query. Filters are boolean.

For example say you have an index of restaurants something like zomato. Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.

So you will use query to find all the documents containing "pizza" and some results will obtained.

Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.

So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.

What happens is that filters are usually applied on the results obtained by querying your index.

Utsav T
  • 1,515
  • 2
  • 24
  • 42
Rahul Bhanushali
  • 553
  • 5
  • 14
1

Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.

Source: https://logz.io/blog/elasticsearch-queries/

1

Queries : calculate score; thus they’re able to return results sorted by relevance. Filters : don’t calculate score, making them faster and easier to cache.

mostafa kazemi
  • 514
  • 6
  • 7