5

I'm trying to do bucket aggregations in elastic search that only runs on the results that were returned from query.

It seems like the the aggregation runs on every hits but only return a portion of it. Which is fine but the problem is the documents that are returned from the aggregation doesn't match the documents that are returned from the query.

Here is the mapping:

LOCATION_MAPPING = {
  id: { type: 'long' },
  name: { type: 'text' },
  street: { type: 'text' },
  city: { type: 'text' },
  state: { type: 'text' },
  zip: { type: 'text' },
  price: { type: 'text' },
  geolocation: { type: 'geo_point' },
  amenities: { type: 'nested' },
  reviews: { type: 'nested' },
};

Here is the query:

{
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "bool": {
      "filter": {
        "geo_distance": {
          "distance": "1000yd",
          "geolocation": [
            -73.990768410025,
            40.713144830193
          ]
        }
      },
      "must": {
        "multi_match": {
          "query": "new york",
          "fields": [
            "name^2",
            "city",
            "state",
            "zip"
          ],
          "type": "best_fields"
        }
      }
    }
  },
  "aggs": {
    "reviews": {
      "nested": {
        "path": "reviews"
      },
      "aggs": {
        "location": {
          "terms": {
            "field": "reviews.locationId"
          },
          "aggs": {
            "avg_rating": {
              "avg": {
                "field": "reviews.rating"
              }
            }
          }
        }
      }
    }
  }
}
user3791980
  • 435
  • 9
  • 18

2 Answers2

1

Following resources should help understand the behavior you are observing and the questions you have:

It seems like the the aggregation runs on every hits but only return a portion of it.

Yes, the terms aggregation that you have will by default only return the top 10 buckets and you can update that with a size parameter (size 0 will return all buckets). See Show all Elasticsearch aggregation buckets, a related post.

the problem is the documents that are returned from the aggregation doesn't match the documents that are returned from the query.

In the Elasticsearch response, you should be seeing the top 10 scoring results (again there's a size param at the root level of the query that defaults to 10 - see Elasticsearch From/Size Doc) and the top 10 buckets for your aggregations. The top scoring results may not have the most common review.locationId.

I think your options are:

Community
  • 1
  • 1
eemp
  • 1,156
  • 9
  • 17
0

Aggregating on the query result set should be possible and the syntax should be like yours, according to the docs.

In my case I was doing a GET _search with a query comprising a query_string containing or and and (wrong, should be OR and AND, see aforementioned docs). This seems to cause all documents to be matched, instead of the expected ones.

Below is the wrong query_string, aggregation runs on all documents:

GET _search
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "(description: \"my description\" or myField: \"my value\") and myOtherField: \"my other value\""
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-2h"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "myAgg": {
      "terms": {
        "field": "myOtherField"
      }
    }
  }
}

For reasons that escape me the field I was aggregating on was not shown as part of the query results in Kibana's Dev Tools, which, together with the fact that lowercase and and or work as expected on Kibana's "Discover" (which uses KQL), made it harder to debug the cause to the query instead of the aggregation.

So if your aggregation is not aggregating on the query results, double check the query itself.

afarah
  • 748
  • 3
  • 19
  • Another source of error: `my_field < 300` vs `my_field:<300`. See the [query string mini language docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax) – afarah Jul 03 '23 at 21:25