Expanding long format time series data with missing Rows

Question

Let's say I have a dataframe:

df <- data.frame(group = c('A','A','A','B','B','B','C','C','C'), 
time = c(1,2,4,1,2,3,5,7,8), 
data = c(5,6,7,8,9,10,1,2,3))

What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C. I would essentially want to put NAs in the place of the data column. How would I go about adding these additional rows? I need a generalized solution NOTE: I EDITED THE QUESTION AS THERE WAS AN ERROR EARLIER WE CANNOT ASSUME THAT THERE WILL BE ONLY 4 OBSERVATIONS FOR EACH GROUP.

The goal would be:

  df <- data.frame(group = c('A','A','A','A','B','B','B','C','C','C','C'), 
    time = c(1,2,3,4,1,2,3,5,6,7,8), 
    data = c(5,6,NA,7,8,9,10,1,NA,2,3))

"So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C." - How do you know this? Is there another data structure that tells you this, or is there a criterion you can articulate by which you can infer this from the shown data frame? — Amadan, Dec 07 '16 at 01:26
One option is following: `df.fill = merge(df, expand.grid(group=unique(df$group), time=min(df$time):max(df$time)), all=TRUE)`. `expand.grid` creates a data frame containing all possible combinations of `group` and `time`. You merge that into your original data frame as full join (`all=TRUE` takes care of that) giving you new rows for the missing combinations. — eipi10, Dec 07 '16 at 01:54
This question seems like a duplicate of [this](http://stackoverflow.com/questions/9996452/r-find-and-add-missing-non-existing-rows-in-time-related-data-frame), [this](http://stackoverflow.com/questions/16787038/r-insert-rows-for-missing-dates-times), [this](http://stackoverflow.com/questions/31150028/insert-missing-time-rows-into-a-dataframe) and probably other SO questions. — eipi10, Dec 07 '16 at 01:59

akrun · Answer 1 · 2016-12-07T06:10:38.597

0

Here is one option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), expand the dataset grouped by 'group' from min to max of 'time' and join on the 'group' and 'time' columns.

library(data.table)
setDT(df)[df[, .(time = min(time):max(time)) , by = group], on = c("group", "time")]
#    group time data
# 1:     A    1    5
# 2:     A    2    6
# 3:     A    3   NA
# 4:     A    4    7
# 5:     B    1    8
# 6:     B    2    9
# 7:     B    3   10
# 8:     C    5    1
# 9:     C    6   NA
#10:     C    7    2
#11:     C    8    3

edited Dec 07 '16 at 06:10

answered Dec 07 '16 at 02:49

akrun

874,273
37
540
662

I had a typo in the question – Jaideep Samuel Dec 07 '16 at 06:07

Expanding long format time series data with missing Rows

1 Answers1

Linked