0

I have a similar question like this one: Fill in missing rows in R

However, the gaps I need to fill are not only months, but also missing years in between for one ID. This is an example:

structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B"), A = c(1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 3L), B = c(1L, 2L, 
1L, 2L, 3L, 1L, 2L, 3L, 3L), Var1 = 12:4), class = "data.frame", row.names = c(NA, 
-9L))

  ID A B Var1
1  A 1 1   12
2  A 1 2   11
3  A 3 1   10
4  A 3 2    9
5  A 3 3    8
6  B 2 1    7
7  B 2 2    6
8  B 2 3    5
9  B 3 3    4

And this is what I want it to look like:

   ID A B Var1
1   A 1 1   12
2   A 1 2   11
3   A 1 3    0
4   A 2 1    0
5   A 2 2    0
6   A 2 3    0
7   A 3 1   10
8   A 3 2    9
9   A 3 3    8
10  B 2 1    7
11  B 2 2    6
12  B 2 3    5
13  B 3 1    0
14  B 3 2    0
15  B 3 3    4

Has someone an idea how to solve it? I have already played around with the solutions mentioned above.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
Sonne SH
  • 1
  • 1
  • In your desired output, by what logic are lines 4-6 (A2X) added but B1X lines are not added? – Jon Spring Mar 03 '21 at 08:44
  • B represent my boundaries. I need to fill the gaps in between. In this example B1X is not needed. 2 and 3 are my boundaries with no gaps in between. Therefor the solutions listed below work, but actually create too may unused rows. – Sonne SH Mar 03 '21 at 10:26

3 Answers3

0
library(tidyverse) 
df <- structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", 
                      "B"), A = c(1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 3L), B = c(1L, 2L, 
                                                                             1L, 2L, 3L, 1L, 2L, 3L, 3L), Var1 = 12:4), class = "data.frame", row.names = c(NA, 
                                                                                                                                                            -9L))
df %>% 
  complete(ID, A, B, fill = list(Var1 = 0))
#> # A tibble: 18 x 4
#>    ID        A     B  Var1
#>    <chr> <int> <int> <dbl>
#>  1 A         1     1    12
#>  2 A         1     2    11
#>  3 A         1     3     0
#>  4 A         2     1     0
#>  5 A         2     2     0
#>  6 A         2     3     0
#>  7 A         3     1    10
#>  8 A         3     2     9
#>  9 A         3     3     8
#> 10 B         1     1     0
#> 11 B         1     2     0
#> 12 B         1     3     0
#> 13 B         2     1     7
#> 14 B         2     2     6
#> 15 B         2     3     5
#> 16 B         3     1     0
#> 17 B         3     2     0
#> 18 B         3     3     4

Created on 2021-03-03 by the reprex package (v1.0.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
0

You could use the solution described there altering it slightly for your problem.

df
full <- with(df, unique(expand.grid(ID = ID, A = A, B = B)))
complete <- merge(df, full, by = c('ID', 'A', 'B'), all.y = TRUE)
complete$Var1[is.na(complete$Var1)] <- 0
Oliver
  • 8,169
  • 3
  • 15
  • 37
0

Just in case somebody else has the same question, this is what I came up with, thanks to the answers provided:

library(tidyverse) 

df %>% group_by(ID) %>% complete(ID, A = full_seq(A,1), B, fill = list(Var1 = 0))

This code avoids that too many unused datasets are produced.

Sonne SH
  • 1
  • 1