0

I have a data frame with time windows on each row. The time window is identified by a start_date and end _date for each ID. For each calendar day, I would like to know how may IDs have a time window spanning that day.

Example data

data <- data.frame(
id = c("A","B","C"), 
start_date = as.POSIXct(c("2020-01-01 01:00:00", "2020-01-02 01:00:00", "2020-01-03 01:00:00")),
end_date = as.POSIXct(c("2020-01-04 01:00:00", "2020-01-03 01:00:00", "2020-01-06 01:00:00")),
stringsAsFactors = FALSE
)

data

  id          start_date            end_date
1  A 2020-01-01 01:00:00 2020-01-04 01:00:00
2  B 2020-01-02 01:00:00 2020-01-03 01:00:00
3  C 2020-01-03 01:00:00 2020-01-06 01:00:00

The output I am looking for is to aggregate this into days with number of IDs present on each day.

day         number_of_ids
2020-01-01  1
2020-01-02  2
2020-01-03  3
2020-01-04  2
2020-01-05  1
2020-01-06  1

Any help much appreciated.

MLyall
  • 119
  • 9

2 Answers2

0

We get the sequence of dates between corresponding 'start_date', 'end_date' in a list column, unnest the list column, then do a group by 'day' and get the number of distinct 'id' with n_distinct in summarise

library(dplyr)
library(purrr)
library(tidyr)
data %>%
    transmute(id, day = map2(as.Date(start_date), as.Date(end_date), 
          ~ seq(.x, .y, by = 'day'))) %>%
    unnest(c(day)) %>% 
    group_by(day) %>%
    summarise(number_of_ids = n_distinct(id))
# A tibble: 6 x 2
#  day        number_of_ids
#  <date>             <int>
#1 2020-01-01             1
#2 2020-01-02             2
#3 2020-01-03             3
#4 2020-01-04             2
#5 2020-01-05             1
#6 2020-01-06             1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

In base R you could do:

a <- with(data, setNames(Map( function(x, y) format(seq(x,y,'1 day'), '%F'), start_date, end_date),id))
aggregate(ind~values, stack(a), length)
      values ind
1 2020-01-01   1
2 2020-01-02   2
3 2020-01-03   3
4 2020-01-04   2
5 2020-01-05   1
6 2020-01-06   1
Onyambu
  • 67,392
  • 3
  • 24
  • 53