3

I have a main database with several weather stations. Each station has coordinates in Decimal degrees. Below just an example, as coordinates were made up

stationid lon lat
1a        80  104
1b        84  110
1c        85  111

Aside, i have a smaller dataset with places. i require to match each place to the closest weather station from the main database (hopefully with a specified distance threshold)

  place lon   lat 
  2a    80.5  104.1
  3b    83    109

So the resulting smaller database will show

  place lon   lat    stationid
  2a    80.5  104.1  1a
  3b    83    109    1b

will appreciate any ideas

Henrik
  • 65,555
  • 14
  • 143
  • 159
Andres Mora
  • 1,040
  • 8
  • 16

1 Answers1

4

Try geosphere::distm + max.col

df2$stationid <- df1$stationid[max.col(-distm(rev(df2[-1]), rev(df1[-1])))]

which gives

  place  lat   lon stationid
1    2a 80.5 104.1        1a
2    3b 83.0 109.0        1b
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • @AndresMora You can read about the usage of ``distm` by typing `?distm`. It is said " longitude/latitude of point(s). Can be a vector of two numbers, a matrix of 2 columns (first one is longitude, second is latitude) or a SpatialPoints* object". I think your `lon` and `lat` names should be switched. – ThomasIsCoding Mar 28 '21 at 02:26
  • Im giving this a try right now but getting `Error in .pointsToMatrix(x) : latitude < -90`. My coordinates data is in decimal degrees. can that be the issue? – Andres Mora Mar 28 '21 at 02:27
  • Just fixed that. I guess it was the lon lat order. Now i get `Error: cannot allocate vector of size 73.0 Gb` – Andres Mora Mar 28 '21 at 02:29
  • @AndresMora Perhaps you have a super big dataset – ThomasIsCoding Mar 28 '21 at 02:30
  • @AndresMora If you don't want to have that memory space error, you can try `for` loops for finding the closest point. That would be slow but safe. – ThomasIsCoding Mar 28 '21 at 02:34
  • hows the "closest" point picked? what distance criteria does your approach use? – Andres Mora Mar 28 '21 at 02:47
  • @AndresMora I used `distGeo`, which is default in `distm` – ThomasIsCoding Mar 28 '21 at 02:48
  • 1
    Maybe: [For each point, distance to nearest point in second dataset in R](https://stackoverflow.com/questions/37333747/for-each-point-distance-to-nearest-point-in-second-dataset-in-r). There OP seems to have the same issue: "I can do the naive implementation by _calculating all pairwise distances_ using `gDistance` and taking the min [like in the answer here], but I have some huge datasets and was looking for something more efficient" – Henrik Mar 28 '21 at 03:08
  • @Henrik Good recommendation. Thanks a lot! – ThomasIsCoding Mar 28 '21 at 10:58
  • 1
    @ThomasIsCoding You are welcome! But eventually it seems like `Error: cannot allocate vector` wasn't a problem for OP after all ;) Cheers – Henrik Mar 28 '21 at 11:16