So I have 2 dataframes, one (dfA) with climate data that has over 100,000 rows, and one (dfB) as a list of sampling criteria (seed zone, altitude, and freq) with only about 50 rows.
What I'm aiming to do is use dfB as a basis to randomly sample rows from dfA based on shared columns (seed zone and altitude), picking n rows based on the freq column. This has been successful! The first code below is what I have managed to make work as far as I can tell, but it gives very limited results as the values have to match exactly and very few do.
Ideally, what I'd like to do, is have the sampling happen as above, but across a range of altitudes based on the dfB$Altitude value (say +-10)
Code for basic sampling (works):
sample <- dfA %>%
inner_join(dfB) %>%
group_by(Seed_zone, Altitude) %>%
sample_n(first(freq))
Code I've tried (in multiple variations) that hasn't worked
sample <- dfA %>%
filter(dfA > dfB -10 & dfA < dfB +10) %>%
left_join(dfB, by = c("Seed_zone")) %>%
group_by(Seed_zone) %>%
sample_n(first(freq))
dfA example data
Climate data | Altitude | Seed_zone |
---|---|---|
5.107707 | 305 | 101 |
4.845323 | 499 | 203 |
2.985152 | 7 | 101 |
22.17268 | 299 | 101 |
dfB example data
Altitude | Seed_zone | freq |
---|---|---|
300 | 101 | 2 |
502 | 203 | 1 |
Expected outcome (using +- 10)
Climate data | Altitude | Seed_zone | freq |
---|---|---|---|
5.107707 | 305 | 101 | 2 |
4.845323 | 499 | 203 | 1 |
22.17268 | 299 | 101 | 2 |