I have two large data frames (nrow = 20,000)
, with common variables in each (X, Y and Z)
. Each data frame shares some unique values of one variable (Y)
, but the values of X and Z
differ. What I would like to do is filter one data frame (the larger of the two (df2)
), by the range of values of X
in the smaller data frame (df1)
for each unique value of Y
. So, for example, if the range of X
values in df1
for a unique Y
value of 1 is 0.02 - 0.08, then I would like to obtain all values of X
and Z
from df2
where the value of X falls within that range for the given value of Y
. I would like to make a new data frame containing these filtered values.
Here is some sample code and data to use an example
# GENERATE SOME DATA
set.seed(5)
df1 <- data.frame(X = round(runif(20, 0.2, 0.3),4), Y = rep(c(4, 5, 6, 7, 8), each = 4), Z = round(runif(20),3))
df2 <- data.frame(X = round(runif(50, 0.1, 0.6),4), Y = rep(seq(1,10,1), each = 5), Z = round(runif(50),3))
## ESTABLISH THE RANGE OF X VALUES FOR EACH UNIQUE Y VALUE IN THE DATA FRAME OF INTEREST (df1)
library(dplyr)
df3 <- df1 %>%
group_by(Y) %>%
mutate(minX=min(X), maxX=max(X)) %>%
ungroup() %>%
distinct(Y, minX, maxX)
## FILTER df2 BY THE RANGE OF Y VALUES IN df1
df4<- df2 %>% filter(Y %in% df1$Y)
So now I have df2
filtered to give me only the range of Y
values which are common to both data frames (df4)
. What I need to do here is to create a new data frame from df4
which contains only the values of X
which are within the range of X
values in df1
for each value of Y
(df3)
.
Any help greatly appreciated.