0

I have more than 1.5 million points on the map with gps coordinates, and I need to find points related to each other for example in 50meters. First thoughts were create distance matrix but size will be so huge and a lot of combinations to calculate with Haversine will be unreachable.

I want to realize in python.

Any ideas?

  • Are points near Poles? Or near the -180/+180 longitude? In such case, just do a pre-filter on points within 0.01 degree. GIS tools often can do better (e.g. grouping regions), but GIS tools for python (e.g. shapely) is designed to iterate every pair. If it is just one time, do it quick and dirty: by tomorrow you will have the results. – Giacomo Catenazzi Sep 19 '22 at 15:06
  • What is your ultimate goal? Do you want just to have all the pairs that meet the criteria or are you gonna do something with this next? I thought about applying some of the clustering algorithms (https://scikit-learn.org/stable/modules/clustering.html) as a first step. Another idea is to split the map into 100m * 100m overlapping squares, so that each point will belong 4 such squares. And then create the distance matrix for each square. – Boris Silantev Sep 19 '22 at 15:07
  • You have not fully specified the desired return value of `process(pointlist)`. Search `find close point algorithm` for some ideas on existing algorithms. – Terry Jan Reedy Sep 19 '22 at 15:16
  • I have all point in format latitude: longitude, (14.123456790123) 13 numbers after dot. My main goal to find points related to each other in 50 meters circle. – Igor Gavrilov Sep 19 '22 at 15:46
  • PS maybe it’s clarify problem: I write a research about accidents and I have a db with 1 million accidents and I want to find a places where accidents so frequently. I think 50 meters areas is enough. – Igor Gavrilov Sep 19 '22 at 15:48
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Sep 19 '22 at 17:16
  • Since you want to find the areas of high density, I would stick to the idea of using clustering algorithms. You can find it interesting - https://stackoverflow.com/questions/16381577/scikit-learn-dbscan-memory-usage – Boris Silantev Sep 19 '22 at 17:41

0 Answers0