0
df1 <- data_frame(time1 = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
              time2 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
              id = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
df2 <- data_frame(time = sort(runif(100, 0, 10)),
              C = rbinom(100, 1, 0.5))

For every row in df1, I want to find the rows in df2 that overlap for time, then assign the median C value for this group of df2 rows to a new column in df1. I'm sure there's some simple way to do this with dplyr's between function, but I'm new to R and haven't been able to figure it out. Thanks!

mowglis_diaper
  • 479
  • 1
  • 9
  • 18

2 Answers2

0

Here's a way, using the merge function to basically do a SQL style cross join, then using the between function:

library(tidyverse)
merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    filter(time_between) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(C))

Using the filter function may result in losing some rows from df1, so an alternative method would be:

merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(ifelse(time_between, C, NA), na.rm = TRUE))
bouncyball
  • 10,631
  • 19
  • 31
0

You can do this in base R with sapply:

df1$median_c <- sapply(seq_along(df1$id), function(i) {

    median(df2$C[df2$time > df1$time1[i] & df2$time < df1$time2[i]])

})
ulfelder
  • 5,305
  • 1
  • 22
  • 40