2

I'm working with a dataset similar to the following:

df <- data.frame(type = c("A", "A", "A", "A", "A", "B", "B", "B", "C", "D", "D", "D"), 
                 start_date = as.Date(c("2010-02-01", "2011-03-15", "2011-09-15", "2015-01-01", "2015-05-15", "2009-01-01", "2015-07-14", "2016-06-30", "2012-01-15", "2010-04-05", "2010-08-01", "2012-04-01"), format = "%Y-%m-%d"), 
                 end_date = as.Date(c("2010-12-31", "2011-07-31", "2014-04-04", "2015-02-15", "2016-12-15", "2013-02-16", "2015-12-31", "2016-12-31", "2015-09-17", "2010-04-10", "2010-09-30", "2013-12-31"), format = "%Y-%m-%d"))

I would like to count the number of observations at any given date.

Expected output

Essentially, I want my results to show that there is only one type until 2010-02-01, then two until 2010-04-05, then three until 2010-04-10 etc, i.e. one column with the date (one row per day) and one column with the count of type.

date count_of_type
2009-01-01 1
2009-01-02 1
2009-01-03 1
...
2010-01-31 1
2010-02-01 2
2010-02-02 2
...
2010-04-04 2
2010-04-05 3
2010-04-06 3
2010-04-07 3
2010-04-08 3
2010-04-09 3
2010-04-10 2
2010-04-11 2
...

I thought this would be would be easy to do, but can't figure it out... any idea?

Cheers,

Fred-LM
  • 300
  • 1
  • 11
  • 2
    Can you show the expected output. Do you neeed `df %>% mutate(n = as.integer(difftime(end_date, start_date, unit = 'day')))` – akrun Aug 28 '19 at 15:37
  • 1
    maybe you need something like a [Gantt Diagram](https://www.gantt.com/)? also, maybe this Q&A can help: https://stackoverflow.com/q/3550341/4137985 – Cath Aug 28 '19 at 15:43
  • 1
    I've just added the expected output – Fred-LM Aug 28 '19 at 15:50

1 Answers1

1

An option would be to transmute by taking the corresponding sequence of 'start_date', 'end_date' by day and then get the count

library(tidyverse)
df %>% 
   transmute(date = map2(start_date, end_date, seq, by = '1 day')) %>% 
   unnest %>% 
   count(date)
akrun
  • 874,273
  • 37
  • 540
  • 662