0

So I have an index of events, each event has an id and can contain multiple locations:

  id: {
    type: 'keyword'
  },
  locations: {
    type: 'geo_point'
  },
  title: {
    type: 'text'
  },
  etc...

Given an Id, I can query ElasticSearch and get a list of locations.

For each location I can then make a geo query and collect more events close by within an area.

Is there a form of query I could use to make a single round trip, maybe using aggregations?

Basically I would submit an id and get a collection of near by ids.

Still fairly new with ES so some help would be appreciated!

mika
  • 1,411
  • 1
  • 12
  • 23

1 Answers1

1

You cannot. Unless you employ scripted metric aggregations, which, depending on your index size, could time out and probably won't scale.

Now, many-to-many comparisons are generally difficult to compute so it'd be best to perhaps average the list of geo points (via a centroid / a center of mass, as illustrated here) and store that average in a new geo_point field (let's call it anchor). That way, you'll compare a static reference point to other points instead of performing two nested for loops trying to determine the closes pairs.

Once you've retrieved the coordinates of the doc in question, you can group your documents by their ids, exclude "self", and order the buckets by the result of the arcDistance function:

POST geoindex/_search
{
  "size": 0,
  "aggs": {
    "nearby_ids": {
      "terms": {
        "field": "id",
        "size": 5,
        "exclude": "0", 
        "order": {
          "geo_distance": "asc"
        }
      },
      "aggs": {
        "geo_distance": {
          "min": {
            "script": {
              "source": "doc['anchor'].arcDistance(params.lat, params.lon)",
              "params": {
                "lat": 40.7388,
                "lon": -73.9982
              }
            }
          }
        }
      }
    }
  }
}

Note that the arcDistance method accepts lat, long as parameters (in that order) and returns results in meters.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Thank you for the pointers! Looking at scripted metric aggregations, I'm dealing with ~200K items with each ~5 locations. I understand that scale is relative but in general would you consider using it in such case? What would you consider an index too large? – mika Apr 07 '21 at 17:04
  • You're welcome! ~200K should be just fine. In my own experience, things start to slow down around 3-5M, depending on the configuration and loads of other variables. Here's a list of [my answers regarding `scripted_metrics`](https://stackoverflow.com/search?tab=votes&q=user%3a8160318%20scripted_metric&searchOn=3), hope some of them help you get started :) – Joe - GMapsBook.com Apr 07 '21 at 19:05
  • Brilliant! Thank you! – mika Apr 07 '21 at 19:05