How can I calculate the average density of points on a map for a defined region?

Question

Let's say I have 20k plus points of data with lat/lng coordinates. All of these points fall within a defined region on a map. I want to calculate the average density of these points for a quarter mile radius.

I'm having trouble explaining it, but the use case would be the ability to enter some arbitrary coordinates, see how many points are within a quarter mile radius of this point, and determine whether this is above or below average for the data.

I'm not looking for a solution in any specific language, instead I'm just looking for a general (pseudo-code) solution or way to think about this problem.

this sounds similar to a heat map. – devlin carnate May 03 '16 at 22:39 — devlin carnate, May 03 '16 at 22:39

score 7 · Accepted Answer · edited May 23 '17 at 12:31

Suppose you have a bunch of latitude, longitude geocoordinates.

If you want to calculate the density of the bounding box that fits around your geocoordinates, then make one O(N) pass through your data set and determine the geocoordinates of the corners.

Once you find them, use the Haversine formula (Java implementation here) to calculate the length of the edges between two corners. Be sure to consistently choose either miles or kilometers for your unit of distance. After you compute the edge distances, you can compute the box's area in units of km^2 or miles^2. From there, compute the density as number of points divided by the area.

If you want to make an ad hoc query for the density around a single target point, then choose a radius R in miles or km. Make one O(N) pass through the data set, and compute the Haversine distance between your target point and every other point. If the other point is within distance R to your target, then add it to a result list. Then compute the density as the number of points within the circle defined by the radius.

If you make a lot of these types of queries, then precompute a spatial indexing data structure. Popular indexes are R-Trees, R*-Trees, and k-d Trees. Below is a picture of an R-Tree from Wikipedia. The tree decomposes the space into rectangular regions so you can query for points quickly.

If your points can fit into memory, then use an open-source library that implements one of these data structures. Here is a link to one library called rtree that I found that allows you to find all points within some radius. I have not personally used that library.

If your points do not fit into memory, then you can use an SQL database. For example, Oracle Spatial implements these types of data structures.

Wow, great answer and EXACTLY what I was looking for. Thanks! — Willem Ellis, May 05 '16 at 23:06

score 2 · Answer 2 · answered May 05 '16 at 17:49

If you care about performance, you should probably use a specialized data structure for indexing your points, something like a kd-tree. That way you can calculate the number of points close to a given point much faster because you can eliminate large chunks from the data.

If you have a lot of points distributed in a very non-uniform way, simply calculating the whole area average might not be very useful. In that case, you could generate a sample of the coordinates and calculate the mean, percentiles etc.

score 1 · Answer 3 · answered May 03 '16 at 22:48

For your use case, go through the points, determining their distance from the 'arbitrary point'. If it's more than a quarter mile, disregard that point, otherwise add to a count. At the end you have a measure of the density of points around that point.

To determine how this compares with the average you could calculate the overall average simply by dividing the total number of points by the total area.

How can I calculate the average density of points on a map for a defined region?

3 Answers3