3

Let's say I have 20k plus points of data with lat/lng coordinates. All of these points fall within a defined region on a map. I want to calculate the average density of these points for a quarter mile radius.

I'm having trouble explaining it, but the use case would be the ability to enter some arbitrary coordinates, see how many points are within a quarter mile radius of this point, and determine whether this is above or below average for the data.

I'm not looking for a solution in any specific language, instead I'm just looking for a general (pseudo-code) solution or way to think about this problem.

Willem Ellis
  • 4,886
  • 7
  • 38
  • 55

3 Answers3

7

Suppose you have a bunch of latitude, longitude geocoordinates.

enter image description here

If you want to calculate the density of the bounding box that fits around your geocoordinates, then make one O(N) pass through your data set and determine the geocoordinates of the corners.

Once you find them, use the Haversine formula (Java implementation here) to calculate the length of the edges between two corners. Be sure to consistently choose either miles or kilometers for your unit of distance. After you compute the edge distances, you can compute the box's area in units of km^2 or miles^2. From there, compute the density as number of points divided by the area.

enter image description here

If you want to make an ad hoc query for the density around a single target point, then choose a radius R in miles or km. Make one O(N) pass through the data set, and compute the Haversine distance between your target point and every other point. If the other point is within distance R to your target, then add it to a result list. Then compute the density as the number of points within the circle defined by the radius.

enter image description here

If you make a lot of these types of queries, then precompute a spatial indexing data structure. Popular indexes are R-Trees, R*-Trees, and k-d Trees. Below is a picture of an R-Tree from Wikipedia. The tree decomposes the space into rectangular regions so you can query for points quickly.

enter image description here

If your points can fit into memory, then use an open-source library that implements one of these data structures. Here is a link to one library called rtree that I found that allows you to find all points within some radius. I have not personally used that library.

If your points do not fit into memory, then you can use an SQL database. For example, Oracle Spatial implements these types of data structures.

Community
  • 1
  • 1
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
2

If you care about performance, you should probably use a specialized data structure for indexing your points, something like a kd-tree. That way you can calculate the number of points close to a given point much faster because you can eliminate large chunks from the data.

If you have a lot of points distributed in a very non-uniform way, simply calculating the whole area average might not be very useful. In that case, you could generate a sample of the coordinates and calculate the mean, percentiles etc.

BKE
  • 573
  • 2
  • 16
1

For your use case, go through the points, determining their distance from the 'arbitrary point'. If it's more than a quarter mile, disregard that point, otherwise add to a count. At the end you have a measure of the density of points around that point.

To determine how this compares with the average you could calculate the overall average simply by dividing the total number of points by the total area.

Tom
  • 7,994
  • 8
  • 45
  • 62