[Solution by Colin Charles worked for me]
This is closely related to R: Insert rows for missing dates/times
I am implementing the answer provided in that post for my own dataset which has a similar problem of gaps in the time series. The smallest unit of time I am interested in is day.
head(data_wide)
Date MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6
1 8/16/14 12:00 AM 14 61 117 187 s1 <NA>
6 8/17/14 12:00 AM 14a 193 56 173 s2 <NA>
11 8/25/14 12:00 AM 4 21 80 187 s2 <NA>
16 8/26/14 12:00 AM 9 1 120 173 s2 <NA>
21 8/27/14 12:00 AM 13 15 100 187 s2 <NA>
26 8/28/14 12:00 AM 5 21 17 173 s2 <NA>
As you can see, there are gaps in the time series. I would like to have entries in the time series for missing days.
My approach was to use the solution provided in the link I referenced above.
Here is my implementation:
data_wide$Date<-as.POSIXct(data_wide$Date,format="%m/%d/%y")
data_wide.zoo<-zoo(data_wide[,-1],data_wide[,1]) #set date to Index
df2 <- merge(data_wide.zoo,zoo(,seq(start(data_wide.zoo),end(data_wide.zoo),by="day")), all=TRUE)
final<-as.data.frame(df2)
Initially, I thought things were perfect
head(final)
MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6
2014-08-16 12:00:00 14 61 117 187 s1 <NA>
2014-08-17 12:00:00 14a 193 56 173 s2 <NA>
2014-08-18 12:00:00 <NA> <NA> <NA> <NA> <NA> <NA>
2014-08-19 12:00:00 <NA> <NA> <NA> <NA> <NA> <NA>
2014-08-20 12:00:00 <NA> <NA> <NA> <NA> <NA> <NA>
2014-08-21 12:00:00 <NA> <NA> <NA> <NA> <NA> <NA>
Looking good so far.
But then I get to a place in the time series where the days are duplicated:
106 2014-11-18 12:00:00 242a 272 246 238b s2 NA
107 2014-11-19 11:00:00 NA NA NA NA NA NA
108 2014-11-19 12:00:00 242b 246 251 238c s2 NA
109 2014-11-20 11:00:00 NA NA NA NA NA NA
110 2014-11-20 12:00:00 242c 249 269 s2 NA NA
111 2014-11-21 11:00:00 NA NA NA NA NA NA
112 2014-11-21 12:00:00 242d 245 249 s2 NA NA
113 2014-11-22 11:00:00 NA NA NA NA NA NA
114 2014-11-22 12:00:00 241a 247 235a 238d s2 NA
Some days are duplicated, and I have no reason to think that these days should be duplicated whereas the other days are not duplicated. Does anyone have suggestions as to how to resolve this? I am not sure where the issue is coming from.
[EDIT]: I can essentially correct the dataframe by removing rows where the time is 11:00. Removing rows with NA
would remove all the hard work of filling in the time series, so this is not a possibility. However, I am concerned that this is not generalizable to all of the dataframes that I will be completing time series for (i.e. will all of them produce rows where time is 11:00 on the merge that can be excluded???)
final2<-final[ grepl( "12:00" , rownames(final) ) , ]