1

I have been successfully using python with elasticsearch-dsl (https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#) and composite aggregations for some time now but have come across a problem I can't solve. The data I'm querying is asset data for multiple organizations and consists of one row for each asset. Each row includes two dates (scan_date and timestamp) for each asset and I need the search result to give me the maximum of (timestamp - scan_date), i.e., the oldest "scan_age," for each organization. In SQL terms:

SELECT organization, MAX(timestamp - scan_date) as oldest_scan_age
FROM database
GROUP BY organization

The elasticsearch-dsl library includes the function script_fields() to define calculated fields, but from what I've read (and tried) I can't perform aggregations on script fields. The elasticsearch feature I need seems to be "runtime fields" but elasticsearch-dsl does not appear to provide a function I can call to specify a runtime field. If such a function existed, I could call it to define a new field called "scan_age" then find its maximum by agency using a standard composite aggregation.

How can I perform this query using elasticsearch-dsl?

Please note that (1) I already know how to do this using the JSON syntax typically associated with querying elasticsearch. What I'm looking for is how to do it in elasticsearch-dsl. (2) I would happily create my own JSON to define just the runtime field if I had a way to use it programatically to modify my elasticsearch-dsl search object.

RonR
  • 298
  • 1
  • 13

2 Answers2

1

Until this feature is added, a workaround is to call extra(), passing a dictionary that defines the standard JSON for the runtime field. This will add the JSON to the search object (at the bottom). For example:

d = {
      "runtime_mappings": {
        "age": {
          "type": "long",
          "script": {
            "source": "emit(doc['@timestamp'].value - doc['host.last_seen_date'].value)"
          }
        }
      }
    }

search_obj = search_obj.extra(d)
RonR
  • 298
  • 1
  • 13
0

There is a pull request to add support to runtime fields, here:

https://github.com/elastic/elasticsearch-dsl-py/pull/1611/commits

You can define the runtime field in the mappings, and then just run a regular aggregation using the DSL.

PUT my-index-000001/
{
  "mappings": {
    "runtime": {
      "oldest_scan_age": {
        "type": "date",
        "script": {
          "source": "emit(your_field_logic)"
        }
      }
    },
    "properties": {
      "@timestamp": {"type": "date"}
    }
  }
}

There is an answer that shows how to work with date max:

https://stackoverflow.com/a/49890932/3112848

Let me know if it worked or you need more help

llermaly
  • 2,331
  • 2
  • 16
  • 29
  • Thank you, but I was specifically looking for a way to do this with the python library elasticsearch-dsl (see link in post). – RonR Nov 28 '22 at 02:34
  • There is a pull request to add support to runtime fields, here: https://github.com/elastic/elasticsearch-dsl-py/pull/1611/commits So you should mark this as the correct answer. I wonder if you can send raw ES queries using the DSL, in that case you can just add the runtime field to the raw query – llermaly Nov 28 '22 at 03:48
  • 1
    This is good news. I'd mark it as the answer if it were an answer instead of a comment. – RonR Nov 28 '22 at 13:51
  • I added the PR link to the current answer. I kept the instructions to do it in the mappings as well, it may be useful. – llermaly Nov 28 '22 at 13:55