1

I'm trying to aggregate over dynamically mapped fields in ElasticSearch.

For example:

POST test/_doc/1
{
    "settings": {
        "range": {
            "value": 200,
            "display": "200 km"
        },
        "transmitter": {
            "value": 1.2,
            "display": "1.2 Ghz"
        }
    }
}

The properties under settings are dynamic. Essentially I need a query like this:

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "settings": {
            "terms": {
                "field": "settings.*.display"
            }
        }
    }
}

Since * doesn't work here, I'm wondering if there's a way to return the fields from a painless script and then maybe use a pipeline aggregation? I can't find the painless equivalent to Object.keys(settings) in JavaScript.

I've seen an approach with nested objects, but I'd like to avoid that, as there might be many 'settings' properties and the default limit is 50, compared to nested_objects with 10000 properties.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
Patrick
  • 7,903
  • 11
  • 52
  • 87

1 Answers1

1

The painless equivalent of Object.keys() is .keySet(). You can implement the following iterative logic in a scripted metric agg:

GET test/_search
{
  "size": 0,
  "aggs": {
    "dynamic_fields_agg": {
      "scripted_metric": {
        "init_script": "state.map = [:];",
        "map_script": """
          def source = params._source['settings'];
            for (def key : source.keySet()) {
              if (source[key].containsKey("display")) {
                 if (state.map.containsKey(key)) { 
                  state.map[key].add(source[key].display);
                 } else {
                   state.map[key] = [source[key].display];
                 }
              }
            }
        """,
        "combine_script": "return state",
        "reduce_script": "return states"
      }
    }
  }
}

which will yield something like

{
  "aggregations":{
    "dynamic_fields_agg":{
      "value":[
        {
          "map":{
            "range":[
              "200 km"
            ],
            "transmitter":[
              "1.2 Ghz"
            ]
          }
        }
      ]
    }
  }
}

Now you can post-process the values in the reduce/combine scripts however you like.


Using nested fields would not bring you much advantage here -- wildcard paths are not allowed there either. I asked that myself some time ago.


UPDATE -- the inline version:

GET /test/_search
{  "size": 0,  "aggs": {    "dynamic_fields_agg": {      "scripted_metric": {        "init_script": "state.map = [:];",        "map_script": "          def source = params._source[\"settings\"];\n            for (def key : source.keySet()) {\n              if (source[key].containsKey(\"display\")) {\n                 if (state.map.containsKey(key)) { \n                  state.map[key].add(source[key].display);\n                 } else {\n                   state.map[key] = [source[key].display];\n                 }\n              }\n            }",        "combine_script": "return state",        "reduce_script": "return states"      }    }  }}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • I can't get the multiline script to work, but it works inline, thank you! This is pretty impressive, I'll look into the reduce function to remove dupes, then the result is exactly what I need. – Patrick Aug 28 '20 at 13:56
  • Cool! I've added the inline version to my answer. – Joe - GMapsBook.com Aug 28 '20 at 13:58
  • 1
    Seems by using `combine_script: 'return state.map'`, the output is reduced by one level. Also, my approach with `reduce` was wrong, instead I'm not adding the value in the first place, with another condition `if (!state.map[key].values.contains(source[key].displayValue)) {`. Thanks! – Patrick Aug 28 '20 at 15:33