0

Let's say I have a dataframe:

df <- data.frame(group = c('A','A','A','B','B','B','C','C','C'), 
time = c(1,2,4,1,2,3,5,7,8), 
data = c(5,6,7,8,9,10,1,2,3))

What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C. I would essentially want to put NAs in the place of the data column. How would I go about adding these additional rows? I need a generalized solution NOTE: I EDITED THE QUESTION AS THERE WAS AN ERROR EARLIER WE CANNOT ASSUME THAT THERE WILL BE ONLY 4 OBSERVATIONS FOR EACH GROUP.

The goal would be:

  df <- data.frame(group = c('A','A','A','A','B','B','B','C','C','C','C'), 
    time = c(1,2,3,4,1,2,3,5,6,7,8), 
    data = c(5,6,NA,7,8,9,10,1,NA,2,3))
  • "So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C." - How do you know this? Is there another data structure that tells you this, or is there a criterion you can articulate by which you can infer this from the shown data frame? – Amadan Dec 07 '16 at 01:26
  • One option is following: `df.fill = merge(df, expand.grid(group=unique(df$group), time=min(df$time):max(df$time)), all=TRUE)`. `expand.grid` creates a data frame containing all possible combinations of `group` and `time`. You merge that into your original data frame as full join (`all=TRUE` takes care of that) giving you new rows for the missing combinations. – eipi10 Dec 07 '16 at 01:54
  • This question seems like a duplicate of [this](http://stackoverflow.com/questions/9996452/r-find-and-add-missing-non-existing-rows-in-time-related-data-frame), [this](http://stackoverflow.com/questions/16787038/r-insert-rows-for-missing-dates-times), [this](http://stackoverflow.com/questions/31150028/insert-missing-time-rows-into-a-dataframe) and probably other SO questions. – eipi10 Dec 07 '16 at 01:59

1 Answers1

0

Here is one option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), expand the dataset grouped by 'group' from min to max of 'time' and join on the 'group' and 'time' columns.

library(data.table)
setDT(df)[df[, .(time = min(time):max(time)) , by = group], on = c("group", "time")]
#    group time data
# 1:     A    1    5
# 2:     A    2    6
# 3:     A    3   NA
# 4:     A    4    7
# 5:     B    1    8
# 6:     B    2    9
# 7:     B    3   10
# 8:     C    5    1
# 9:     C    6   NA
#10:     C    7    2
#11:     C    8    3
akrun
  • 874,273
  • 37
  • 540
  • 662