How to subset from dataframe2 depending on the values in dataframe1 and stack all subsets in one dataframe in R?

Question

I want to create a function which takes the row of dataframe df1 (with columns x1, x2, x3), and the output of this function is the subset of the dataframe df2 (with columns y1, y2), this subset was calculated based on the values in df1 row. I want to apply this function for each row of df1 and stack the resulting dataframes (subsets of df2) in one big dataframe. Example how it can be done with for loop: Example of df1:

x1   x2   x3
a    3.1  4.5
b    9.0  10.1
a    9.0  20.0
c    1.1  6.0

Example of df2:

y1  y2
a   4.0
a   10.0
a   3.5
b   9.8
b   9.5
b   25.0
c   8.2
c   12.0

Example of the for loop which does this processing:

desired_df = df2[1, ]
for (i in 1:nrow(df1)) {
  subset = filter(df2, y1 == df1[i, "x1"] & y2 > df1[i, "x2"] & y2 < df1[i, "x3"])
  desired_df = rbind(desired_df, subset)
}
desired_df = desired_df[-1, ]

the desired dataframe is:

  y1   y2
  a  4.0
  a  3.5
  b  9.8
  b  9.5
  a 10.0

Depending on the values in df1, subsetting can give dataframes of different lengths (sometimes there are no elements) The question is: how to write this process of subsetting and appending in vectorized form, without for loop?

Please show a small reproducible example and expected output — akrun, Apr 13 '19 at 14:26
If all your function does it filter rows, applygin this *"for each row of df1"* is really inefficient. It's best to do the filtering in a vectorized manner. As akrun suggested, if you can improve the question a little (e.g., output of `dput(head(x))` and your expected output from that sample), it would really help. Suggested reads: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Apr 13 '19 at 14:53

score 0 · Accepted Answer · answered Apr 14 '19 at 19:11

Looks like we need a fuzzy_join

library(dplyr)
library(fuzzyjoin)
fuzzy_inner_join(df1, df2, by = c('x1' = 'y1', 'x2' = 'y2', 'x3' = 'y2'),
          match_fun = list(`==`, `<=`, `>`)) %>%
    select(names(df2))
#  y1   y2
#1  a  4.0
#2  a  3.5
#3  b  9.8
#4  b  9.5
#5  a 10.0

data

df1 <- structure(list(x1 = c("a", "b", "a", "c"), x2 = c(3.1, 9, 9, 
1.1), x3 = c(4.5, 10.1, 20, 6)), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(y1 = c("a", "a", "a", "b", "b", "b", "c", "c"), 
    y2 = c(4, 10, 3.5, 9.8, 9.5, 25, 8.2, 12)), class = "data.frame", 
    row.names = c(NA, 
-8L))

How to subset from dataframe2 depending on the values in dataframe1 and stack all subsets in one dataframe in R?

1 Answers1

data