3

I want to fill NAs with a monthly sequence. I tried to do it using

fill(dates,seq.Dates(dates,%m+%)) 

but it didnt work.

My current date is the following:

dates <- structure(list(dates = structure(c(1L, NA, NA, NA, NA, 2L, NA, 
NA, NA, 3L), .Label = c("2019-02-01", "2019-02-05", "2019-02-09"
), class = "factor")), row.names = c(NA, -10L), class = "data.frame")

My desired output should look something like this:

Desired output

camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    That's not what `fill` does. If you take a look at the docs, it fills NAs with the most recent (either upward or downward) non-NA value. You might instead want to make complete sequences and join or coalesce with this data. Similar ideas [here](https://stackoverflow.com/q/49562597/5325862) and [here](https://stackoverflow.com/q/31465384/5325862) – camille Dec 23 '19 at 14:00
  • You also probably want your dates to be proper date objects; right now they're factors. You didn't give specifics of what about your code didn't work, but that's one likely reason – camille Dec 23 '19 at 14:09
  • Those are months you are completing right? – Sotos Dec 23 '19 at 14:10

3 Answers3

5

First, like I mentioned above, your dates aren't actually dates—they're factors that you'll need to convert.

I'm not well-versed in the imputation packages—other folks might have ideas with those—but that might be overkill for a simple example like this anyway. Assign groups based on whether the date is already present, then count along those groups and use that as an offset of how many months to add on.

Your actual output can be more concise: you don't necessarily need to keep the group column, and you probably want to overwrite the date column, whereas for presentation I've put the filled in dates in a separate column.

library(dplyr)
library(lubridate)

dates %>%
  mutate(dates = as.Date(dates),
         group = cumsum(!is.na(dates))) %>%
  group_by(group) %>%
  tidyr::fill(dates) %>%
  mutate(filled = dates + months(seq_along(dates) - 1))
#> # A tibble: 10 x 3
#> # Groups:   group [3]
#>    dates      group filled    
#>    <date>     <int> <date>    
#>  1 2019-02-01     1 2019-02-01
#>  2 2019-02-01     1 2019-03-01
#>  3 2019-02-01     1 2019-04-01
#>  4 2019-02-01     1 2019-05-01
#>  5 2019-02-01     1 2019-06-01
#>  6 2019-02-05     2 2019-02-05
#>  7 2019-02-05     2 2019-03-05
#>  8 2019-02-05     2 2019-04-05
#>  9 2019-02-05     2 2019-05-05
#> 10 2019-02-09     3 2019-02-09

Edited to add: I toyed with a second method where you instead count by how many observations each date has, then make a list-column of your sequence, and unnest it. I don't think it's better, and it's more convoluted (especially from having to essentially group_by twice), but maybe there's a use-case for it.

dates %>%
  mutate(dates = as.Date(dates)) %>%
  tidyr::fill(dates) %>%
  count(dates) %>%
  group_by(dates) %>%
  mutate(filled = list(seq.Date(from = dates, by = "1 month", length.out = n))) %>%
  tidyr::unnest(filled)
# same output as above
camille
  • 16,432
  • 18
  • 38
  • 60
  • @IceCreamToucan without filling in the starting date, you're adding e.g. 2 months to the date NA. You need to fill to get a baseline to add on to – camille Dec 23 '19 at 14:40
  • 1
    @IceCreamToucan yeah I did consider doing something like that to pull out the first date in each group, but figured `tidyr::fill` was easy/foolproof enough – camille Dec 23 '19 at 14:51
2

Here is a base R solution

dout <- Reduce(rbind,lapply(split(dates,cumsum(!is.na(dates$dates))), 
                            function(v) v <- within(v, dates <- seq(as.Date(dates[1]), 
                                                                    length = nrow(v),
                                                                    by="1 month"))))

such that

> dout
        dates
1  2019-02-01
2  2019-03-01
3  2019-04-01
4  2019-05-01
5  2019-06-01
6  2019-02-05
7  2019-03-05
8  2019-04-05
9  2019-05-05
10 2019-02-09
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Another option involving dplyr, tidyr and lubridate could be:

dates %>%
 fill(dates) %>%
 group_by(rleid = group_indices(., dates)) %>%
 mutate(dates = ymd(dates) + months(1:n()-1)) %>%
 ungroup() %>%
 select(-rleid)

   dates     
   <date>    
 1 2019-02-01
 2 2019-03-01
 3 2019-04-01
 4 2019-05-01
 5 2019-06-01
 6 2019-02-05
 7 2019-03-05
 8 2019-04-05
 9 2019-05-05
10 2019-02-09
tmfmnk
  • 38,881
  • 4
  • 47
  • 67