1
  • I have two matrices with large amounts of gps data:

    1. User Based GPS Data for each user i ((Latitude_i, Longitude_i), ...)) ~ 12 Mio GPS Coordinates
    2. Store Based GPS Data for each store j ((Latitude_j, Longitude_j), ..)) ~ 15 k GPS Coordinates
  • What I need ultimately is the closest store j (from 2.) for each user i (from 1.).

  • The brut force (but computationally not feasible) solution would be, to calculate the geographical distance between each user i (from 1.) and each store j from (2.) and then take the lowest distance.

  • Since this would result in a 12 Mio x 15 k matrix and I do not have access to a Big Data infrastructure, this is not really working for me.

So I am looking for smart solutions right now.

  • What crossed my mind so far, was the idea of finding the simple numerically closest point between each user i (1.) and each store j (2.)
using apply and which.min(abs(lat_i-lat_j) + abs(long_i + long_j))

and then calculate the geographical distance between these two points.

  • However, the challenge here is that I need a function that minimizes the overall difference, consisting of two points and the above solution doesnt seem to work.
  • Any help is very much appreciated!!
ReLa
  • 33
  • 3
  • Related: [*Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)*](https://stackoverflow.com/q/31668163/2204410) (but probably too computationally intensive on your datasets) – Jaap Aug 21 '19 at 10:48
  • Thank you! And yes, thats the problem. I need to do some smart pre-filtering or anything like that. – ReLa Aug 21 '19 at 10:51
  • Maybe its better to use `(lat_i-lat_j)^2` instead of `abs(lat_i-lat_j)`. And maybe its better to solve this with a database having `gist index`: https://dba.stackexchange.com/q/182839 – GKi Aug 21 '19 at 11:25

0 Answers0