1

I want to create a function which takes the row of dataframe df1 (with columns x1, x2, x3), and the output of this function is the subset of the dataframe df2 (with columns y1, y2), this subset was calculated based on the values in df1 row. I want to apply this function for each row of df1 and stack the resulting dataframes (subsets of df2) in one big dataframe. Example how it can be done with for loop: Example of df1:

x1   x2   x3
a    3.1  4.5
b    9.0  10.1
a    9.0  20.0
c    1.1  6.0

Example of df2:

y1  y2
a   4.0
a   10.0
a   3.5
b   9.8
b   9.5
b   25.0
c   8.2
c   12.0

Example of the for loop which does this processing:

desired_df = df2[1, ]
for (i in 1:nrow(df1)) {
  subset = filter(df2, y1 == df1[i, "x1"] & y2 > df1[i, "x2"] & y2 < df1[i, "x3"])
  desired_df = rbind(desired_df, subset)
}
desired_df = desired_df[-1, ]

the desired dataframe is:

  y1   y2
  a  4.0
  a  3.5
  b  9.8
  b  9.5
  a 10.0

Depending on the values in df1, subsetting can give dataframes of different lengths (sometimes there are no elements) The question is: how to write this process of subsetting and appending in vectorized form, without for loop?

bastak
  • 103
  • 5
  • 2
    Please show a small reproducible example and expected output – akrun Apr 13 '19 at 14:26
  • If all your function does it filter rows, applygin this *"for each row of df1"* is really inefficient. It's best to do the filtering in a vectorized manner. As akrun suggested, if you can improve the question a little (e.g., output of `dput(head(x))` and your expected output from that sample), it would really help. Suggested reads: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Apr 13 '19 at 14:53
  • @akrun edited the question and added reproducible example – bastak Apr 14 '19 at 10:40

1 Answers1

0

Looks like we need a fuzzy_join

library(dplyr)
library(fuzzyjoin)
fuzzy_inner_join(df1, df2, by = c('x1' = 'y1', 'x2' = 'y2', 'x3' = 'y2'),
          match_fun = list(`==`, `<=`, `>`)) %>%
    select(names(df2))
#  y1   y2
#1  a  4.0
#2  a  3.5
#3  b  9.8
#4  b  9.5
#5  a 10.0

data

df1 <- structure(list(x1 = c("a", "b", "a", "c"), x2 = c(3.1, 9, 9, 
1.1), x3 = c(4.5, 10.1, 20, 6)), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(y1 = c("a", "a", "a", "b", "b", "b", "c", "c"), 
    y2 = c(4, 10, 3.5, 9.8, 9.5, 25, 8.2, 12)), class = "data.frame", 
    row.names = c(NA, 
-8L))
akrun
  • 874,273
  • 37
  • 540
  • 662