I have two datasets "January" and "Samples"
January:
structure(list(Month = c("January-", "January-", "January-",
"January-", "January-"), long = c(-179.916672, -179.75, -179.583328,
-179.416672, -179.25), lat = c(39.916668, 39.916668, 39.916668,
39.916668, 39.916668), npp = c(297.813, 304.971, 292.946, 296.196,
285.804)), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
Samples:
structure(list(Lat = c(-14.5472718653846, -14.3532975333333,
-14.2926716206897, -14.2153998571429, -14.0921711666667), Long =
c(-168.131368846154,
-170.325961333333, -169.499131724138, -169.060881071429, -169.664168333333
), Sample = c(1, 2, 3, 4, 5)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
"January" has lat/long and npp (surface productivity) values. This dataset is very large and the latitude spans between -19 to 30 decimal degrees (even though it looks like it has the same latitude in the example).
"Samples" has Lat/Long for the 120 samples I am trying to match the coordinates with January with to get npp values.
I want to use match_nrst_haversine from hutilscpp. This is very similar to this question Match two datasets by minimum geospatial distance (R), however, I only have the Lat and Long in the "Samples" df to match to "January"
This is the code I've tried, but I'm not sure what to use for Index
January[, c("long", "lat", "npp") := match_nrst_haversine(Lat,
Long,
npp_lat = npp$Lat,
npp_lon = npp$Long,
Index = samp$Lat:Long,
close_enough = 0,
cartesian_R = 5)]