Fill missing dates by group

Question

In my data, there exist observations for some IDs in some months and not for others, e.g.

dat <- data.frame(c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4), c(rep(30, 2), rep(25, 5), rep(20, 3)), c('2017-01-01', '2017-02-01', '2017-04-01', '2017-02-01', '2017-01-01', '2017-02-01', '2017-03-01', '2017-01-01',
                    '2017-02-01', '2017-04-01'))
colnames(dat) <- c('id', 'value', 'date')

I would like to, for each id value, insert a row that includes the month(s) missing for that id and NA for value.

Is there a way to (somewhat) concisely do this for all months in seq(min(as.Date(dat$date)), max(as.Date(dat$date)), by = 'months')? I often use tidyverse and data.table, but am open to any approach.

Is this what you want? https://stackoverflow.com/questions/46130246/filling-missing-dates-in-a-grouped-time-series-a-tidyverse-way — thelatemail, Feb 06 '18 at 00:03

Rich Pauloo · Accepted Answer · 2021-01-11T04:20:31.267

`tidyr::complete()` fills missing values

add id and date as the columns (...) to expand for

library(tidyverse)

complete(dat, id, date)


# A tibble: 16 x 3
      id date       value
   <dbl> <date>     <dbl>
 1  1.00 2017-01-01  30.0
 2  1.00 2017-02-01  30.0
 3  1.00 2017-03-01  NA  
 4  1.00 2017-04-01  25.0
 5  2.00 2017-01-01  NA  
 6  2.00 2017-02-01  25.0
 7  2.00 2017-03-01  NA  
 8  2.00 2017-04-01  NA  
 9  3.00 2017-01-01  25.0
10  3.00 2017-02-01  25.0
11  3.00 2017-03-01  25.0
12  3.00 2017-04-01  NA  
13  4.00 2017-01-01  20.0
14  4.00 2017-02-01  20.0
15  4.00 2017-03-01  NA  
16  4.00 2017-04-01  20.0

score 9 · Answer 2 · answered Feb 06 '18 at 00:08

tidyr::complete() works given your example data:

library(tidyverse)
dat %>% 
  group_by(id) %>% 
  complete(date) %>% 
  ungroup()

      id date       value
   <dbl> <fct>      <dbl>
 1  1.00 2017-01-01  30.0
 2  1.00 2017-02-01  30.0
 3  1.00 2017-03-01  NA  
 4  1.00 2017-04-01  25.0
 5  2.00 2017-01-01  NA  
 6  2.00 2017-02-01  25.0
 7  2.00 2017-03-01  NA  
 8  2.00 2017-04-01  NA  
 9  3.00 2017-01-01  25.0
10  3.00 2017-02-01  25.0
11  3.00 2017-03-01  25.0
12  3.00 2017-04-01  NA  
13  4.00 2017-01-01  20.0
14  4.00 2017-02-01  20.0
15  4.00 2017-03-01  NA  
16  4.00 2017-04-01  20.0

This works, but it's a bit more complicated than it needs to be. You can simply supply `id` and `date` as columns to expand in `complete`, ad avoid grouping and ungrouping. — Rich Pauloo, Feb 06 '18 at 00:23

score 2 · Answer 3 · answered Feb 06 '18 at 00:09

Here is a way to do it using expand.grid and merge:

dat <- data.frame(c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4), c(rep(30, 2), rep(25, 5), rep(20, 3)), as.Date(c('2017-01-01', '2017-02-01', '2017-04-01', '2017-02-01', '2017-01-01', '2017-02-01', '2017-03-01', '2017-01-01',
                                                                                            '2017-02-01', '2017-04-01')))
colnames(dat) <- c('id', 'value', 'date')

date_range <- seq(min(as.Date(dat$date)), max(as.Date(dat$date)), by = 'months')

dat_expanded <- expand.grid(date_range, dat$id)

colnames(dat_expanded) <- c("date", "id")

result <- merge(dat, dat_expanded, by=c("id", "date"), all.y = T)

To avoid duplicate rows in dat_expanded, you should have used dat_expanded <- expand.grid(date_range, unique(dat$id)) — user131476, Mar 08 '22 at 14:05

Fill missing dates by group

3 Answers3

`tidyr::complete()` fills missing values

Linked

Related

Fill missing dates by group

3 Answers3

tidyr::complete() fills missing values

Linked

Related

`tidyr::complete()` fills missing values