5

I have a dataframe with a start and end date for each row. I would like to calculate the number of days between the two dates and split it by year. So going from this:

id <- c(1,2,3)
start <- as.Date(c('01/01/2015','01/01/2016','07/01/2015'), format = '%m/%d/%Y')
end <- as.Date(c('12/31/2016','12/31/2016','12/31/2016'), format = '%m/%d/%Y')
df <- data.frame(id, start, end)
id Start End
1 01/01/2015 12/31/2016
2 01/01/2016 12/31/2016
3 01/07/2015 12/31/2016

To this:

id Start End days_no. year_2015 year_2016
1 01/01/2015 12/31/2016 730 365 365
2 01/01/2016 12/31/2016 365 0 365
3 07/01/2015 12/31/2016 548 183 365

Any help is appreciated, please note i would like to compute the yearly stats dynamically, i may end up with many year columns in my practical case... I'm guessing lubridate may help but i'm unsure where to start.

Romain
  • 171
  • 11

2 Answers2

3

Here is a base R option

transform(
  df,
  days_no = end - start,
  year_2015 = pmax(as.Date("2015-12-31") - start, 0),
  year_2016 = pmax(end - as.Date("2016-1-1"), 0)
)

which gives

  id      start        end  days_no year_2015 year_2016
1  1 2015-01-01 2016-12-31 730 days  364 days  365 days
2  2 2016-01-01 2016-12-31 365 days    0 days  365 days
3  3 2015-07-01 2016-12-31 549 days  183 days  365 days
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

Here's one approach using tidyverse and lubridate.

First, separate the rows by calendar year, to use to measure the number of days for each year. Each row will include dates to be counted in each calendar year, starting with January 1st and ending with December 31st if overlapping multiple years. Then, it is easy to calculate the number of days in a given year.

The results from this example are slightly different than what I have. Year 2016 is a leap year and has 366 days. If the number of days are not inclusive of either start or end dates, you would get a different answer.

Edit (6/8/23): Based on comment below, added 1 for inclusive dates. This would give 366 days for leap years, such as 2016, and 365 days for other years.

library(tidyverse)
library(lubridate)

df %>%
  mutate(date_int = interval(start, end),
         year = map2(year(start), year(end), seq)) %>%
  unnest(year) %>%
  mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
         year_sect = intersect(date_int, year_int),
         start_new = as.Date(int_start(year_sect)),
         end_new = as.Date(int_end(year_sect))) %>%
  select(id, start_new, end_new) %>%
  mutate(year = year(start_new),
         days = as.numeric(end_new - start_new) + 1) %>%
  right_join(df) %>%
  pivot_wider(id_cols = c(id, start, end), names_from = year, values_from = days, names_prefix = "year_", values_fill = list(days = 0)) %>%
  mutate(days_number = reduce(select(., starts_with("year_")), `+`))

Output

     id start      end        year_2015 year_2016 days_number
  <dbl> <date>     <date>         <dbl>     <dbl>       <dbl>
1     1 2015-01-01 2016-12-31       365       366         731
2     2 2016-01-01 2016-12-31         0       366         366
3     3 2015-07-01 2016-12-31       184       366         550
Ben
  • 28,684
  • 5
  • 23
  • 45
  • This works wonderfully. Thanks a lot! I had to reorder the columns a bit but this is great... Does it not require purr to be installed too? – Romain Apr 13 '21 at 20:48
  • Glad to hear! Yes - `reduce` and `map2` use `purrr` but I thought this was loaded with `tidyverse`... – Ben Apr 13 '21 at 21:05
  • It is loaded with tidyverse, absolutely, my bad... All good then! – Romain Apr 13 '21 at 21:47
  • 1
    I'd recommend adding 1 to the calculation, `days = as.numeric(end_new - start_new) + 1`. You point out the odd behavior, but if you had a multi-year interval from, say 2010-01-01 to 2020-01-01, it doesn't make any sense to have it only count 364 days for years like 2011, 2013, 2014, etc, that are wholly contained in the interval (which it currently does). – Gregor Thomas Jun 08 '23 at 18:48
  • 1
    @GregorThomas Thanks, I see what you mean. I revised the answer - now, 2016 is 366 days (leap year), and other years should be up to 365 days (inclusive). – Ben Jun 08 '23 at 19:33