I have two matrices with large amounts of gps data:
- User Based GPS Data for each user i ((Latitude_i, Longitude_i), ...)) ~ 12 Mio GPS Coordinates
- Store Based GPS Data for each store j ((Latitude_j, Longitude_j), ..)) ~ 15 k GPS Coordinates
What I need ultimately is the closest store j (from 2.) for each user i (from 1.).
The brut force (but computationally not feasible) solution would be, to calculate the geographical distance between each user i (from 1.) and each store j from (2.) and then take the lowest distance.
- Since this would result in a 12 Mio x 15 k matrix and I do not have access to a Big Data infrastructure, this is not really working for me.
So I am looking for smart solutions right now.
- What crossed my mind so far, was the idea of finding the simple numerically closest point between each user i (1.) and each store j (2.)
using apply and which.min(abs(lat_i-lat_j) + abs(long_i + long_j))
and then calculate the geographical distance between these two points.
- However, the challenge here is that I need a function that minimizes the overall difference, consisting of two points and the above solution doesnt seem to work.
- Any help is very much appreciated!!