I am trying to add rows to a dataframe based on missing values in a numerical sequence.
Here's a reprex. I want to go from this:
> df[-c(1,3,9),]
id year V1 V2 V3
2 1 2019 0.84788413 0.10418523 0.2249371
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
To this:
id year V1 V2 V3
1 1 2018 NA NA NA
2 1 2019 0.84788413 0.10418523 0.2249371
3 1 2020 NA NA NA
4 2 2018 0.73183889 0.66380165 0.7681833
5 2 2019 0.38263072 -0.66741116 -0.1803099
6 2 2020 -0.05915745 2.09814096 0.8558323
7 3 2018 1.42148474 -1.65590355 -0.0879526
8 3 2019 1.46178632 1.96796970 -0.3489630
9 3 2020 NA NA NA
10 4 2018 0.12511779 -0.91978526 -2.3880951
11 4 2019 0.93936831 -0.24440871 0.3249178
12 4 2020 -1.57864369 -0.05853787 0.6078194
The logic is to add the missing year
rows and add NAs to the rest of the columns.
Data:
structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), year = c(2019L,
2018L, 2019L, 2020L, 2018L, 2019L, 2018L, 2019L, 2020L), V1 = c(0.847884128902485,
0.731838887436047, 0.382630718058478, -0.0591574520333011, 1.42148473746568,
1.46178631522088, 0.125117791300285, 0.939368308197552, -1.57864368576782
), V2 = c(0.104185228129027, 0.663801650973095, -0.667411160654917,
2.09814095835567, -1.65590354896798, 1.96796970263568, -0.919785264321656,
-0.244408708889214, -0.0585378742959754), V3 = c(0.224937129454626,
0.7681832776488, -0.180309905647701, 0.855832252932298, -0.0879525996394009,
-0.34896299605019, -2.38809514212219, 0.324917787941616, 0.607819444746004
)), row.names = c(2L, 4L, 5L, 6L, 7L, 8L, 10L, 11L, 12L), class = "data.frame")