Expand start and end dates to unbalanced monthly panel with dplyr

Question

I have start and end dates for events that I want to expand into a monthly panel, and I wanted to know if there was any tool in dplyr for solving this problem. The following code does what I want to do with ddply(). It first creates an example tibble data.frame (called "wide") where "id" represents an individual and "HomeNum" is an event for that individual. The next line creates a "date" variable that is a monthly series from "StartDate" to "FinishDate" within each "id" by "HomeNum" group.

library(plyr)
library(dplyr)
library(tibble)
wide = 
    tibble(
        id = c(1, 1, 2, 2, 2),
        HomeNum = c(0,1,0,1,2),
        StartDate = as.Date(c("2001-01-01", "2001-03-01", "2000-04-01", "2001-02-01", "2002-08-01")),
        FinishDate = as.Date(c("2001-02-01", "2002-05-01", "2001-01-01", "2002-07-01", "2002-12-01"))
    )
panel = 
    ddply(wide, 
          ~id+HomeNum, 
          transform, 
          date = seq.Date(StartDate, FinishDate, by = "month")
    )

I assume that dplyr, as the "the next iteration of plyr", must have some way to implement a similar solution (and output a tibble), but the following did not work:

panel = 
    wide %>% 
    group_by(id, HomeNum) %>% 
    mutate(date = seq.Date(StartDate, FinishDate, by = "month"))

and returned

Error in mutate_impl(.data, dots) :
    Column `date` must be length 1 (the group size), not 2

Frankly, I am surprised that the ddply() solution works and does not throw a similar error.

My implementation with ddply() is similar to answers to this question.

score 3 · Accepted Answer · answered May 21 '18 at 03:00

3

You can coerce the elements of date to lists and unnest.

library(tidyverse)
wide %>%
  group_by(id, HomeNum) %>%
  mutate(date = list(seq.Date(StartDate, FinishDate, by = "month"))) %>%
  unnest(date)

answered May 21 '18 at 03:00

hpesoj626

3,529
1
17
25

I cannot get this to work. I get "Error: Each column must either be a list of vectors or a list of data frames". – randy May 21 '18 at 03:24
Unnesting lists of dates is an issue in `tidyr` that should be fixed in the most recent version(s). Referenced [here](https://github.com/tidyverse/tidyr/issues/407). I got this error, updated to 0.8.1, and no longer get it, but will post a workaround – camille May 22 '18 at 16:18

camille · Answer 2 · 2018-05-23T11:43:58.650

Using unnest on a list of dates was an issue in previous versions of tidyr. I got this same error and found a workaround, but then no longer needed the workaround once I updated to tidyr 0.8.1. It's an issue that's documented in a few issues on GitHub—#407 and #450 were ones I looked at.

If you have a version that can't unnest dates, you can build on @hpesoj626's answer by converting the dates to strings, unnesting, then converting the strings back to dates.

library(tidyverse)

wide <- tibble(
    id = c(1, 1, 2, 2, 2),
    HomeNum = c(0,1,0,1,2),
    StartDate = as.Date(c("2001-01-01", "2001-03-01", "2000-04-01", "2001-02-01", "2002-08-01")),
    FinishDate = as.Date(c("2001-02-01", "2002-05-01", "2001-01-01", "2002-07-01", "2002-12-01"))
  )

# with previous versions of tidyr
wide %>%
  group_by(id, HomeNum) %>%
  mutate(date = list(seq.Date(StartDate, FinishDate, by = "month") %>% as.character())) %>%
  tidyr::unnest() %>%
  mutate(date = as.Date(date))
#> # A tibble: 50 x 5
#> # Groups:   id, HomeNum [5]
#>       id HomeNum StartDate  FinishDate date      
#>    <dbl>   <dbl> <date>     <date>     <date>    
#>  1     1       0 2001-01-01 2001-02-01 2001-01-01
#>  2     1       0 2001-01-01 2001-02-01 2001-02-01
#>  3     1       1 2001-03-01 2002-05-01 2001-03-01
#>  4     1       1 2001-03-01 2002-05-01 2001-04-01
#>  5     1       1 2001-03-01 2002-05-01 2001-05-01
#>  6     1       1 2001-03-01 2002-05-01 2001-06-01
#>  7     1       1 2001-03-01 2002-05-01 2001-07-01
#>  8     1       1 2001-03-01 2002-05-01 2001-08-01
#>  9     1       1 2001-03-01 2002-05-01 2001-09-01
#> 10     1       1 2001-03-01 2002-05-01 2001-10-01
#> # ... with 40 more rows

Otherwise, a solution like the one they posted should work:

# with tidyr 0.8.1
wide %>%
  group_by(id, HomeNum) %>%
  mutate(date = list(seq.Date(StartDate, FinishDate, by = "month"))) %>%
  tidyr::unnest()
#> # A tibble: 50 x 5
#> # Groups:   id, HomeNum [5]
#>       id HomeNum StartDate  FinishDate date      
#>    <dbl>   <dbl> <date>     <date>     <date>    
#>  1     1       0 2001-01-01 2001-02-01 2001-01-01
#>  2     1       0 2001-01-01 2001-02-01 2001-02-01
#>  3     1       1 2001-03-01 2002-05-01 2001-03-01
#>  4     1       1 2001-03-01 2002-05-01 2001-04-01
#>  5     1       1 2001-03-01 2002-05-01 2001-05-01
#>  6     1       1 2001-03-01 2002-05-01 2001-06-01
#>  7     1       1 2001-03-01 2002-05-01 2001-07-01
#>  8     1       1 2001-03-01 2002-05-01 2001-08-01
#>  9     1       1 2001-03-01 2002-05-01 2001-09-01
#> 10     1       1 2001-03-01 2002-05-01 2001-10-01
#> # ... with 40 more rows

Another option is to gather the data into a long format, where observations have a type column showing whether it's the start or finish date. Then use complete to fill in missing dates between each group's minimum and maximum dates. Gathering keeps the type column, which gets filled in as NA for the dates that are added. You could then drop the type column if it's no longer useful.

wide %>%
  gather(key = type, value = date, StartDate, FinishDate) %>%
  group_by(id, HomeNum) %>%
  complete(date = seq.Date(min(date), max(date), by = "month"))
#> # A tibble: 50 x 4
#> # Groups:   id, HomeNum [5]
#>       id HomeNum date       type      
#>    <dbl>   <dbl> <date>     <chr>     
#>  1     1       0 2001-01-01 StartDate 
#>  2     1       0 2001-02-01 FinishDate
#>  3     1       1 2001-03-01 StartDate 
#>  4     1       1 2001-04-01 <NA>      
#>  5     1       1 2001-05-01 <NA>      
#>  6     1       1 2001-06-01 <NA>      
#>  7     1       1 2001-07-01 <NA>      
#>  8     1       1 2001-08-01 <NA>      
#>  9     1       1 2001-09-01 <NA>      
#> 10     1       1 2001-10-01 <NA>      
#> # ... with 40 more rows

Created on 2018-05-22 by the reprex package (v0.2.0).

I edited your 0.8,1 solution to remove the `= as.Date(date)` from the `mutate()` function, as date was already stored as a Date in that version. Also, is there any benefit to the second `mutate()` function? The code seems to have the same output without it. — randy, May 23 '18 at 04:36
@randy thanks for catching that, the second mutate was left over from debugging with the earlier version — camille, May 23 '18 at 11:43

Expand start and end dates to unbalanced monthly panel with dplyr

2 Answers2