6

Consider the following Elasticsearch (v5.4) object (an "award" doc type):

{
  "name": "Gold 1000",
  "date": "2017-06-01T16:43:00.000+00:00",
  "recipient": {
    "name": "James Conroy",
    "date_of_birth": "1991-05-30"
  }
}

The mapping type for both award.date and award.recipient.date_of_birth is "date".

I want to perform a range aggregation to get a list of the age ranges of the recipients of this award ("Under 18", "18-24", "24-30", "30+"), at the time of their award. I tried the following aggregation query:

{
  "size": 0,
  "query": {"match_all": {}},
  "aggs": {
    "recipients": {
      "nested": {
        "path": "recipient"
      },
      "aggs": {
        "age_ranges": {
          "range": {
            "script": {
              "inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
            },
            "keyed": true,
            "ranges": [{
              "key": "Under 18",
              "from": 0,
              "to": 18
            }, {
              "key": "18-24",
              "from": 18,
              "to": 24
            }, {
              "key": "24-30",
              "from": 24,
              "to": 30
            }, {
              "key": "30+",
              "from": 30,
              "to": 100
            }]
          }
        }
      }
    }
  }
}

Problem 1

But I get the following error due to the comparison of dates in the script portion:

Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].

The DateTime object is the award.date field, and the MutableDateTime object is the award.recipient.date_of_birth field. I've tried doing something like doc['recipient.date_of_birth'].date.toDateTime() (which doesn't work despite the Joda docs claiming that MutableDateTime has this method inherited from a parent class). I've also tried doing something further like this:

"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"

Which sadly also doesn't work :(

Problem 2

I notice if I do this:

"aggs": {
  "recipients": {
    "nested": {
      "path": "recipient"
    },
    "aggs": {
      "award_years": {
        "terms": {
          "script": {
            "inline": "doc['date'].date.year"
          }
        }
      }
    }
  }
}

I get 1970 with a doc_count that happens to equal the total number of docs in ES. This leads me to believe that accessing a property outside of the nested object simply does not work and gives me back some default like the epoch datetime. And if I do the opposite (aggregating dates of birth without nesting), I get the exact same thing for all the dates of birth instead (1970, epoch datetime). So how can I compare those two dates?

I am racking my brain here, and I feel like there's some clever solution that is just beyond my current expertise with Elasticsearch. Help!

If you want to set up a quick environment for this to help me out, here is some curl goodness:

curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'

That should give you a "joelinux" index with two "award" docs to test this out ("James Conroy" and "Martin McNealy"). Thanks in advance!

JoeLinux
  • 4,198
  • 1
  • 29
  • 31
  • Have you tried this script? `doc['date'].date.getYear() - doc['recipient.date_of_birth'].date.getYear()` – Val Jun 21 '17 at 16:13
  • I actually get a different error, since you're accessing the `doc.date` field from within a nested context and that cannot work. – Val Jun 21 '17 at 16:23
  • Why does the `recipient` field need to be nested? Can there be many recipients? – Val Jun 21 '17 at 16:25
  • @Val: subtracting years doesn't work, because that doesn't calculate age properly. The month and day also matter. And `recipient` is nested because there can only be one recipient. The entire document describes the awarding of an award (this award, at this day and time, to this recipient). – JoeLinux Jun 21 '17 at 17:37
  • If there can only be one recipient, `nested` is useless, that was my point – Val Jun 21 '17 at 18:16
  • @Val: My actual data is something else, this is all just example data. Assume there can be multiple recipients then. It doesn't make a difference to the actual problem, which is just trying to subtract a date in a nested property from a date in an outer property. – JoeLinux Jun 21 '17 at 18:58
  • Well, if only one recipient was the norm in your data model, it would have made a huge difference since you wouldn't need it to be nested at all, hence solving your problem. Since you're accessing the date via scripting, you can get rid of the nested type and that would solve your problem as well. Unless of course you have other kind of queries that require the recipients to be nested. – Val Jun 21 '17 at 19:03
  • Right, but my data is nested. Let's assume I can't change the data. The question is more about the nature of the query than the nature of the data structure. In reality, my data's "recipient" has multiple addresses and contact info, other statuses, etc, that are nested as well. The document is much larger than shown here, and other queries work with the nesting as well. I can't change it just for this issue. – JoeLinux Jun 21 '17 at 20:45
  • Ok, let's assume you have multiple `nested` recipients then. How do you know in your query which recipient needs to be taken into account in your script? – Val Jun 22 '17 at 03:26
  • @Val: can we just assume that the data looks exactly as I described it in the problem? It's not changing. That's the structure. – JoeLinux Jun 22 '17 at 05:11

1 Answers1

3

Unfortunately, you can't access nested and non-nested fields within the same context. As a workaround, you can change your mapping to automatically copy date from nested document to root context using copy_to option:

{
    "mappings": {
        "award": {
            "properties": {
                "name": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "date": {
                    "type": "date"
                },
                "date_of_birth": {
                    "type": "date" // will be automatically filled when indexing documents
                },
                "recipient": {
                    "properties": {
                        "name": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "date_of_birth": {
                            "type": "date",
                            "copy_to": "date_of_birth" // copy value to root document
                        }
                    },
                    "type": "nested"
                }
            }
        }
    }
}

After that you can access date of birth using path date, though the calculations to get number of years between dates are slightly tricky:

Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()

Here I convert original JodaTime date objects to system.time.LocalDate objects:

  1. Get number of milliseconds from 1970-01-01
  2. Convert to number of days from 1970-01-01 by dividing it to 86400000L (number of ms in one day)
  3. Convert to LocalDate object
  4. Create date-based Period object from two dates
  5. Get number of years between two dates.

So, the final aggregation query looks like this:

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "age_ranges": {
            "range": {
                "script": {
                    "inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
                },
                "keyed": true,
                "ranges": [
                    {
                        "key": "Under 18",
                        "from": 0,
                        "to": 18
                    },
                    {
                        "key": "18-24",
                        "from": 18,
                        "to": 24
                    },
                    {
                        "key": "24-30",
                        "from": 24,
                        "to": 30
                    },
                    {
                        "key": "30+",
                        "from": 30,
                        "to": 100
                    }
                ]
            }
        }
    }
}
Random
  • 3,807
  • 2
  • 30
  • 49
  • When you convert using `getMillis()` and then `getYears()` from that, aren't you ignoring leap years completely? – JoeLinux Jun 26 '17 at 15:13
  • No, this function accepts two dates, not a single interval, so it should handle leap years properly. As for `getMillis()`, this is just a conversion from JodaTime date to Java date. Maybe not the simplest one, but it works properly – Random Jun 26 '17 at 16:40
  • Now I see your concern. But it still should handle years properly since `Period` is a date-based amount of time, not just plain time interval – Random Jun 26 '17 at 18:26
  • Doesn't `getMillis()` return a value of type `Long`? – JoeLinux Jun 26 '17 at 18:34
  • It does. First I get number of milliseconds from 1970-01-01. After that, by dividing this value to 86400000L, I get a number of days from 1970-01-01. Finally I create `LocalDate` object which represents the original date without time. The same actions used to convert the second date. Then I pass both dates to create date-based `Period` and finally use it to get number of years between two dates. Added more details to the answer – Random Jun 26 '17 at 18:48
  • The `copy_to` feature doesn't seem to copy the data into my nested field. And according to this [GitHub issue](https://github.com/elastic/elasticsearch/issues/23077), it seems that it's not going to work. I ended up getting around all this by calculating the age at index time and attaching it as a root level property. Not my favorite, but it works, and querying that value without processing will always be more efficient. – JoeLinux Jun 26 '17 at 20:39
  • Actually it works, checked on v5.4.2. It's not included into `_source` object but the value is correct on calculations. As for your approach, yes it also solves the problem with a better performance, though doesn't quite match the original question – Random Jun 27 '17 at 08:38
  • If I use the mapping provided in the answer, and then I insert the two test awards I have at the end of my question (James Conroy and Martin McNealy), this doesn't work for me on 5.4.0. If I do a terms agg on "recipient.date" (because it should be copied), I get 1970 with 2 docs (wrong). If I do the query in the answer, it's still wrong. I should have one 24-year old and one 34-year old. It's not working for me. – JoeLinux Jun 27 '17 at 13:06
  • Sorry, you're right. `copy_to` doesn't allow to copy from root to nested doc, only vice versa. Updated my answer to avoid further misleading. – Random Jun 27 '17 at 17:13