Find overlapping rows between dataframes using dplyr?

Question

df1 <- data_frame(time1 = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
              time2 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
              id = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
df2 <- data_frame(time = sort(runif(100, 0, 10)),
              C = rbinom(100, 1, 0.5))

For every row in df1, I want to find the rows in df2 that overlap for time, then assign the median C value for this group of df2 rows to a new column in df1. I'm sure there's some simple way to do this with dplyr's between function, but I'm new to R and haven't been able to figure it out. Thanks!

bouncyball · Answer 1 · 2018-03-22T19:26:52.447

Here's a way, using the merge function to basically do a SQL style cross join, then using the between function:

library(tidyverse)
merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    filter(time_between) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(C))

Using the filter function may result in losing some rows from df1, so an alternative method would be:

merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(ifelse(time_between, C, NA), na.rm = TRUE))

score 0 · Answer 2 · answered Mar 22 '18 at 19:40

0

You can do this in base R with sapply:

df1$median_c <- sapply(seq_along(df1$id), function(i) {

    median(df2$C[df2$time > df1$time1[i] & df2$time < df1$time2[i]])

})

answered Mar 22 '18 at 19:40

ulfelder

5,305
1
22
40

Find overlapping rows between dataframes using dplyr?

2 Answers2