Obtaining boolean clause match counts in Elasticsearch

Question

We have some Elasticsearch queries that take the following form:

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "default_field": "content",
            "query": "Lorem ipsum dolor sit amet"
          }
        },
        {
          "query_string": {
            "default_field": "content",
            "query": "Nunc ac auctor massa"
          }
        }
      ]
    }
  }
}

We'd like to surface the hit counts on each boolean clause. Our current brute force approach has been to execute a secondary multi-search under the covers, splitting each clause into its own separate query to get the individual counts. This can get very expensive as we support up to 50 of these clauses which can potentially result into another 50 queries executed behind the scenes.

We've looked for alternative ways to extract counts such as Get matched terms from Lucene query or lucene get matched terms in query but all of them involve bean counting the actual hits. This is prohibitive as we can potentially have thousands of them.

Is there another more efficient approach/technique (preferably in Elasticsearch) for getting those counts that we might have missed?

score 2 · Accepted Answer · answered May 20 '16 at 20:15

Maybe adding filter aggregation can do it:

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "default_field": "content",
            "query":  "Lorem ipsum dolor sit amet"
          }
        },
        {
          "query_string": {
            "default_field": "content",
            "query": "Nunc ac auctor massa"
          }
        }
      ]
    }
},
  "aggs": {
    "2": {
      "filters": {
        "filters": {
          "message:fake": {
            "query": {
              "query_string": {
                "query": "content: \"Lorem ipsum dolor sit amet\"",
                "analyze_wildcard": true
              }
            }
          },
          "message:data": {
            "query": {
              "query_string": {
                "query": "content:\"Nunc ac auctor massa\"",
                "analyze_wildcard": true
              }
            }
          }
        }
      }
    }
  }
}

So you will know how many docs do they appear separately.

Excellent! This should do it! – gstathis May 20 '16 at 20:41 — gstathis, May 20 '16 at 20:41

Obtaining boolean clause match counts in Elasticsearch

1 Answers1