I want to add missing observations in my panel data set, but keep running into memory issues.
I use the following code (based on this topic):
library(dplyr)
group_by(df, group) %>%
complete(time = full_seq(time 1L)) %>%
mutate_each(funs(replace(., which(is.na(.)), 0)), -group, -time)
My data would look similar to the data in that topic, thus:
group time value
1 1 50
1 3 52
1 4 10
2 1 4
2 4 84
2 5 2
which I would like to look like
group time value
1 1 50
1 2 0
1 3 52
1 4 10
2 1 4
2 2 0
2 3 0
2 4 84
2 5 2
The problem is that I keep running out of memory (it is a only a 1 GB file with around 1.5 million observations). Any suggestions on how to do this differently?