Your question is hard to understand. Some examples would go a long way to clearing up what you are trying to do.
This is my understanding: You want to compare an id
from tbl1
to tbl2
. In cases that the id matches, you want to know how often the date
from tbl1
is between a start
and an end
from tbl2
.
First I will create some fake data.
library(tidyverse)
tbl1 <- tribble(
~id, ~date,
1, "2018-01-01",
2, "2018-01-01",
3, "2018-01-01",
5, "2018-01-01"
) %>%
mutate(date = as.Date(date))
tbl2 <- tribble(
~id, ~start, ~end,
1, "2017-01-01", "2019-01-01",
1, "2016-01-01", "2019-01-01",
1, "2015-01-01", "2019-01-01",
2, "2017-01-01", "2019-01-01",
2, "2019-01-01", "2020-01-01",
3, "2019-01-01", "2020-01-01",
4, "2019-01-01", "202o-01-01"
) %>%
mutate(start = as.Date(start),
end = as.Date(end))
We should find that ID 1 is in range 3 times, ID 2 is in range once, ID 3 is in range zero times. We will remove id 5 from tbl1 because it is not found in tbl2, and we will ignore ID 4 because it is not in tbl1.
This can be accomplished in a couple of steps. First we use an inner_join
to keep only the matching ids. Then we determine whether the date is in_range
, finally, for each id
, we calculate the number of times the date was in_range
.
tbl3 <- tbl1 %>%
inner_join(tbl2, by = "id") %>%
mutate(in_range = date > start & date < end) %>%
group_by(id) %>%
summarize(count = sum(in_range))