2

I'm trying to find a way to only return the results of one aggregation in an Elasticsearch query. I have a max bucket aggregation (the one that I want to see) that is calculated from a sum bucket aggregation based on a date histogram aggregation. Right now, I have to go through 1,440 results to get to the one I want to see. I've already removed the results of the base query with the size: 0 modifier, but is there a way to do something similar with the aggregations as well? I've tried slipping the same thing into a few places with no luck.

Here's the query:

{
    "size": 0,
    "query": {
        "range": {
            "timestamp": {
                "gte": "2018-11-28",
                "lte": "2018-11-28"
            }
        }
    },
    "aggs": {
        "hits_per_minute": {
            "date_histogram": {
                "field": "timestamp",
                "interval": "minute"
            },
            "aggs": {
                "total_hits": {
                    "sum": {
                        "field": "hits_count"
                    }
                }
            }
        },
        "max_transactions_per_minute": {
            "max_bucket": {
                "buckets_path": "hits_per_minute>total_hits"
            }
        }
    }
}
Ant
  • 545
  • 1
  • 9
  • 26
  • May I ask you which version of Elasticsearch do you use? From the question it is not clear which bucket of those 1440 you actually want to see, may you clarify on that? It would help if you posted a couple of example documents and the desired output. Thank you. – Nikolay Vasiliev Dec 03 '18 at 19:06
  • 1
    6.4.3. The 1440 refers to the number of minutes in a day; the query above produces a set of sum metrics in a date histogram by minute, and the max bucket aggregation produces a single metric based on those metrics. Regardless of the order in which I list the buckets in the query body, it always places the max bucket at the very end of the response body. The document content is irrelevant. – Ant Dec 03 '18 at 20:07
  • If you don't want `hits_per_minute` agg then simply remove it. It won't affect the `max_transactions_per_minute` aggregation as both are independent of each other. – Nishant Dec 04 '18 at 00:17

1 Answers1

5

Fortunately enough, you can do that with bucket_sort aggregation, which was added in Elasticsearch 6.4.

Do it with bucket_sort

POST my_index/doc/_search
{
  "size": 0,
  "query": {
    "range": {
      "timestamp": {
        "gte": "2018-11-28",
        "lte": "2018-11-28"
      }
    }
  },
  "aggs": {
    "hits_per_minute": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "minute"
      },
      "aggs": {
        "total_hits": {
          "sum": {
            "field": "hits_count"
          }
        },
        "max_transactions_per_minute": {
          "bucket_sort": {
            "sort": [
              {"total_hits": {"order": "desc"}}
            ],
            "size": 1
          }
        }
      }
    }
  }
}

This will give you a response like this:

{
  ...
  "aggregations": {
    "hits_per_minute": {
      "buckets": [
        {
          "key_as_string": "2018-11-28T21:10:00.000Z",
          "key": 1543957800000,
          "doc_count": 3,
          "total_hits": {
            "value": 11
          }
        }
      ]
    }
  }
}

Note that there is no extra aggregation in the output and the output of hits_per_minute is truncated (because we asked to give exactly one, topmost bucket).

Do it with filter_path

There is also a generic way to filter the output of Elasticsearch: Response filtering, as this answer suggests.

In this case it will be enough to just do the following query:

POST my_index/doc/_search?filter_path=aggregations.max_transactions_per_minute
{ ... (original query) ... }

That would give the response:

{
  "aggregations": {
    "max_transactions_per_minute": {
      "value": 11,
      "keys": [
        "2018-12-04T21:10:00.000Z"
      ]
    }
  }
}
Nikolay Vasiliev
  • 5,656
  • 22
  • 31