-2

I am trying to add a field to a dataframe that is a count of all the entries in a second dataframe for which the ID fields have the same value and the date from dataframe1 is within the date range of two different columns in dataframe2

df1$ACTIVE<- for(i in df1$ID){ifelse(DF2$ID == DF1$ID & df2$START < df1$DATE < df2$end,1,0)}

yusuzech
  • 5,896
  • 1
  • 18
  • 33
  • 4
    Please provide us some examples. Read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – yusuzech Sep 11 '19 at 22:25

1 Answers1

0

Your question is hard to understand. Some examples would go a long way to clearing up what you are trying to do. This is my understanding: You want to compare an id from tbl1 to tbl2. In cases that the id matches, you want to know how often the date from tbl1 is between a start and an end from tbl2.

First I will create some fake data.

library(tidyverse)

tbl1 <- tribble(
  ~id, ~date,
  1, "2018-01-01",
  2, "2018-01-01",
  3, "2018-01-01",
  5, "2018-01-01"
) %>% 
  mutate(date = as.Date(date))


tbl2 <- tribble(
  ~id, ~start, ~end,
  1, "2017-01-01", "2019-01-01",
  1, "2016-01-01", "2019-01-01",
  1, "2015-01-01", "2019-01-01",
  2, "2017-01-01", "2019-01-01",
  2, "2019-01-01", "2020-01-01",
  3, "2019-01-01", "2020-01-01",
  4, "2019-01-01", "202o-01-01"
) %>% 
  mutate(start = as.Date(start), 
         end = as.Date(end))

We should find that ID 1 is in range 3 times, ID 2 is in range once, ID 3 is in range zero times. We will remove id 5 from tbl1 because it is not found in tbl2, and we will ignore ID 4 because it is not in tbl1.

This can be accomplished in a couple of steps. First we use an inner_join to keep only the matching ids. Then we determine whether the date is in_range, finally, for each id, we calculate the number of times the date was in_range.

tbl3 <- tbl1 %>% 
  inner_join(tbl2, by = "id") %>% 
  mutate(in_range = date > start & date < end) %>% 
  group_by(id) %>% 
  summarize(count = sum(in_range))

Lief Esbenshade
  • 793
  • 4
  • 13