Filling NAs with a monthly sequence

Question

I want to fill NAs with a monthly sequence. I tried to do it using

fill(dates,seq.Dates(dates,%m+%))

but it didnt work.

My current date is the following:

dates <- structure(list(dates = structure(c(1L, NA, NA, NA, NA, 2L, NA, 
NA, NA, 3L), .Label = c("2019-02-01", "2019-02-05", "2019-02-09"
), class = "factor")), row.names = c(NA, -10L), class = "data.frame")

My desired output should look something like this:

Desired output

That's not what `fill` does. If you take a look at the docs, it fills NAs with the most recent (either upward or downward) non-NA value. You might instead want to make complete sequences and join or coalesce with this data. Similar ideas [here](https://stackoverflow.com/q/49562597/5325862) and [here](https://stackoverflow.com/q/31465384/5325862) — camille, Dec 23 '19 at 14:00
You also probably want your dates to be proper date objects; right now they're factors. You didn't give specifics of what about your code didn't work, but that's one likely reason — camille, Dec 23 '19 at 14:09

camille · Accepted Answer · 2019-12-23T15:01:27.173

First, like I mentioned above, your dates aren't actually dates—they're factors that you'll need to convert.

I'm not well-versed in the imputation packages—other folks might have ideas with those—but that might be overkill for a simple example like this anyway. Assign groups based on whether the date is already present, then count along those groups and use that as an offset of how many months to add on.

Your actual output can be more concise: you don't necessarily need to keep the group column, and you probably want to overwrite the date column, whereas for presentation I've put the filled in dates in a separate column.

library(dplyr)
library(lubridate)

dates %>%
  mutate(dates = as.Date(dates),
         group = cumsum(!is.na(dates))) %>%
  group_by(group) %>%
  tidyr::fill(dates) %>%
  mutate(filled = dates + months(seq_along(dates) - 1))
#> # A tibble: 10 x 3
#> # Groups:   group [3]
#>    dates      group filled    
#>    <date>     <int> <date>    
#>  1 2019-02-01     1 2019-02-01
#>  2 2019-02-01     1 2019-03-01
#>  3 2019-02-01     1 2019-04-01
#>  4 2019-02-01     1 2019-05-01
#>  5 2019-02-01     1 2019-06-01
#>  6 2019-02-05     2 2019-02-05
#>  7 2019-02-05     2 2019-03-05
#>  8 2019-02-05     2 2019-04-05
#>  9 2019-02-05     2 2019-05-05
#> 10 2019-02-09     3 2019-02-09

Edited to add: I toyed with a second method where you instead count by how many observations each date has, then make a list-column of your sequence, and unnest it. I don't think it's better, and it's more convoluted (especially from having to essentially group_by twice), but maybe there's a use-case for it.

dates %>%
  mutate(dates = as.Date(dates)) %>%
  tidyr::fill(dates) %>%
  count(dates) %>%
  group_by(dates) %>%
  mutate(filled = list(seq.Date(from = dates, by = "1 month", length.out = n))) %>%
  tidyr::unnest(filled)
# same output as above

@IceCreamToucan without filling in the starting date, you're adding e.g. 2 months to the date NA. You need to fill to get a baseline to add on to — camille, Dec 23 '19 at 14:40
@IceCreamToucan yeah I did consider doing something like that to pull out the first date in each group, but figured `tidyr::fill` was easy/foolproof enough — camille, Dec 23 '19 at 14:51

score 2 · Answer 2 · answered Dec 23 '19 at 14:09

Here is a base R solution

dout <- Reduce(rbind,lapply(split(dates,cumsum(!is.na(dates$dates))), 
                            function(v) v <- within(v, dates <- seq(as.Date(dates[1]), 
                                                                    length = nrow(v),
                                                                    by="1 month"))))

such that

> dout
        dates
1  2019-02-01
2  2019-03-01
3  2019-04-01
4  2019-05-01
5  2019-06-01
6  2019-02-05
7  2019-03-05
8  2019-04-05
9  2019-05-05
10 2019-02-09

Wow. Did not know about `by="1 month"`. Very nifty! – SmokeyShakers Dec 23 '19 at 14:12 — SmokeyShakers, Dec 23 '19 at 14:12

score 0 · Answer 3 · answered Dec 23 '19 at 14:47

Another option involving dplyr, tidyr and lubridate could be:

dates %>%
 fill(dates) %>%
 group_by(rleid = group_indices(., dates)) %>%
 mutate(dates = ymd(dates) + months(1:n()-1)) %>%
 ungroup() %>%
 select(-rleid)

   dates     
   <date>    
 1 2019-02-01
 2 2019-03-01
 3 2019-04-01
 4 2019-05-01
 5 2019-06-01
 6 2019-02-05
 7 2019-03-05
 8 2019-04-05
 9 2019-05-05
10 2019-02-09

Filling NAs with a monthly sequence

3 Answers3