Elasticsearch aggregation on a field with dynamic properties

Question

Given the following mapping where variants are a nested type and options is a flattened type:

{
  "doc_type" : "product",
  "id" : 1,
  "variants" : [
    {
      "options" : {
        "Size" : "XS",
      },
      "price" : 1,
    },

    {
      "options" : {
        "Size" : "S",
        "Material": "Wool"
      },
      "price" : 6.99,
    },
  ]
}

I want to run an aggregation that produces data in the following format:

{
  "variants.options.Size": {
    "buckets" : [
      {
        "key" : "XS",
        "doc_count" : 1
      },
      {
        "key" : "S",
        "doc_count" : 1
      },
    ],
  },
  "variants.options.Material": {
    "buckets" : [
      {
        "key" : "Wool",
        "doc_count" : 1
      }
    ],
  },
}

I could very easily do something like:

"aggs": {
    "variants.options.Size": {
      "terms": {
        "field": "variants.options.Size"
      }
    },
    "variants.options.Material": {
      "terms": {
        "field": "variants.options.Material"
      }
    }
  }

The caveat here is that we're using the flattened type for options because the fields in options are dynamic and so there is no way for me to know before hand that we want to aggregate on Size and Material.

Essentially, I want to tell Elasticsearch that it should aggregate on whatever keys it finds under options. Is there a way to do this?

score 0 · Answer 1 · answered Feb 26 '21 at 21:13

I want to tell Elasticsearch that it should aggregate on whatever keys it finds under options. Is there a way to do this?

Not directly. I had the same question a while back. I haven't found a clean solution to this day and I'm convinced there isn't one.

Luckily, there's a scripted_metric workaround that I outlined here. Applying it to your use case:

POST your_index/_search
{
  "size": 0,
  "aggs": {
    "dynamic_variant_options": {
      "scripted_metric": {
        "init_script": "state.buckets = [:];",
        "map_script": """
          def variants = params._source['variants'];
          for (def variant : variants) {
            for (def entry : variant['options'].entrySet()) {
              def key = entry.getKey();
              def value = entry.getValue();
              def path = "variants.options." + key;

              if (state.buckets.containsKey(path)) {
                if (state.buckets[path].containsKey(value)) {
                  state.buckets[path][value] += 1;
                } else {
                  state.buckets[path][value] = 1;
                }
              } else {
                state.buckets[path] = [value:1];
              }
            }
          }
        """,
        "combine_script": "return state",
        "reduce_script": "return states"
      }
    }
  }
}

would yield:

"aggregations" : {
  "dynamic_variant_options" : {
    "value" : [
      {
        "buckets" : {
          "variants.options.Size" : {
            "S" : 1,
            "XS" : 1
          },
          "variants.options.Material" : {
            "Wool" : 1
          }
        }
      }
    ]
  }
}

You'll need to adjust the painless code if you want the buckets to be arrays of key-doc_count pairs instead of hash maps like in my example.

Elasticsearch aggregation on a field with dynamic properties

1 Answers1