0

I have two large databases (500k to 3M rows) inside PostgreSQL, each containing a set of GPS lat longs. I need to compare all coordinates in one database with that of the other database, and find points that are within 300m of each other.

I started using PostgreSQL as I had heard about its spatial indexing which speeds up geometry-related tasks a lot. The idea is to use spatial indexing e.g. R Trees, to only check nodes that were already determined to be close to each other, instead of checking the entire database every time O(n^2)

However, I couldn't find anything related to this. *Edit: I am not looking for a distance calculation algorithm, I am looking for optimizations to speed up the comparison of locations in my 2 tables. So it is not a duplicate question.

  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Jan 30 '18 at 12:41
  • Possible duplicate of [Calculate distance between two latitude-longitude points? (Haversine formula)](https://stackoverflow.com/questions/27928/calculate-distance-between-two-latitude-longitude-points-haversine-formula) – halfer Jan 30 '18 at 12:43
  • I would loop through every item in the database, and for each one find all points within a bounding box (which is fast) and then for that small set, apply the Haversine formula (see link). You can do this in a stored procedure to reduce database I/O. – halfer Jan 30 '18 at 12:44
  • Sorry about that, Ill keep that in mind next time. – user3577892 Jan 30 '18 at 16:20

1 Answers1

0

Just giving a tip here.

Distance is a result of a static function (calculation) that has input the 2 sets of long and lat.

Work with the formula you can find on the internet to find the deviation (sum of lat and long) bigger than which the distance is certainly greater than 300 km .

This deviation will be bigger the closer the point to the poles is and will be lesser the closer the point to the equator is. So divide the coords in 3-4 different areas based on how close to the equator they are, then calculate how different the other coords must be in order to be certain that distance is higher than 300km and then work only with the remaining coords.

that would drastically improve the performance of any type of search you plan to perform.

good luck!

  • Is there any spatial indexing or related feature I can use in postgre to automate the division of coords into areas? That is one of the reasons why I am using postgre – user3577892 Jan 30 '18 at 12:28