0

I have two data frames that I need to join. One contains soil data and one contains yield data. They both record data from the same field, but one recorded in Imperial measure and the other in Metric.... how would I join these two together so that I assign a single nearest value for yield to every observation in the nutrient df?

The two dataframes are below:

YIELD

yield <- structure(list(Longitude = c(1.8937763, 1.8937744, 1.8937713, 
1.8937691, 1.8937682, 1.893768, 1.8937661, 1.8937643, 1.8937618, 
1.8937586, 1.8937553, 1.8937526, 1.8937498, 1.8937474, 1.8937452, 
1.8937431, 1.8937418, 1.8937433, 1.8937723, 1.893766, 1.8937557, 
1.8937434, 1.8937301, 1.8937179, 1.8937053), Latitude = c(54.6667203, 
54.6667327, 54.6667522, 54.6667646, 54.6667681, 54.6667683, 54.6667795, 
54.6667903, 54.666802, 54.6668161, 54.6668303, 54.6668442, 54.6668581, 
54.6668703, 54.6668801, 54.6668894, 54.6668935, 54.6668885, 54.6667066, 
54.666715, 54.6667251, 54.6667342, 54.6667433, 54.6667527, 54.6667635
), yld = c(12.68, 5.941, 3.912, 3.69, 4.214, 13.02, 10.492, 6.505, 
6.731, 5.095, 4.001, 3.535, 3.613, 3.568, 3.348, 2.89, 2.742, 
5.854, 3.684, 2.692, 5.898, 15.06, 12.04, 10.945, 7.937)), row.names = c(NA, 
25L), class = "data.frame")

NUTRIENT

nutrient <- structure(list(Latitude = c(54.66923226, 54.66926369, 54.66929511, 
54.66932653, 54.66935796, 54.66938938, 54.66901103, 54.66904245, 
54.66907387, 54.6691053, 54.66913672, 54.66916815, 54.66919957, 
54.669231, 54.66926242, 54.66929385, 54.66932527, 54.6693567, 
54.66938812, 54.66941955, 54.66945097, 54.66882122, 54.66885264, 
54.66888406, 54.66891549), Longitude = c(1.891378242, 1.891380318, 
1.891382394, 1.89138447, 1.891386546, 1.891388622, 1.891415402, 
1.891417478, 1.891419554, 1.89142163, 1.891423706, 1.891425782, 
1.891427858, 1.891429934, 1.89143201, 1.891434086, 1.891436162, 
1.891438238, 1.891440314, 1.891442391, 1.891444467, 1.891454638, 
1.891456714, 1.89145879, 1.891460866), Countrate = c(1129.055905, 
1122.331819, 1120.017601, 1117.303756, 1111.629963, 1107.838333, 
1192.336826, 1190.236609, 1186.359013, 1180.932882, 1171.95523, 
1159.86637, 1145.181517, 1133.11088, 1126.753139, 1124.103172, 
1121.31539, 1115.520496, 1111.72757, 1106.714465, 1101.951969, 
1201.191293, 1205.706169, 1208.074004, 1209.243511)), row.names = c(NA, 
25L), class = "data.frame") 

Thanks in advance!

B

BUCKERS99
  • 16
  • 1
  • Does this answer your question? [Merging two sets of data by data.table roll='nearest' function](https://stackoverflow.com/questions/54013468/merging-two-sets-of-data-by-data-table-roll-nearest-function) – user438383 May 25 '22 at 15:46
  • Unfortunately it **does** matter which row is merged from Yield to Nutrient. So no it doens't unfortunately. The closest I have found to answering is this answer: https://stackoverflow.com/questions/55752064/finding-closest-coordinates-between-two-large-data-sets But it introduces lots of NA values into the end result. – BUCKERS99 May 25 '22 at 16:20

1 Answers1

2

Converting your data to sf and then performing a spatial join using nearest feature should do the trick, in this case though due to the distribution of your points the closest yield point to all your nutrient points is the same (and they are in the middle of the ocean which seems odd for soils data).

library(sf)
yield_sf <- st_as_sf(yield, coords = c("Longitude","Latitude" ), crs=4326)


nutrient_sf <- st_as_sf(nutrient, coords = c("Longitude","Latitude" ), crs=4326)


yield_nutrient_sf <- st_join(nutrient_sf, yield_sf,st_nearest_feature )
BEVAN
  • 426
  • 2
  • 12
  • Bevan. You are a genius sir! Thank you very much. Yes I have taken a sample of some data and manipulated it a bit! Thank you! – BUCKERS99 May 25 '22 at 16:40