R: Inserting missing dates in time series duplicates some days on merge, but not others

Question

[Solution by Colin Charles worked for me]

This is closely related to R: Insert rows for missing dates/times

I am implementing the answer provided in that post for my own dataset which has a similar problem of gaps in the time series. The smallest unit of time I am interested in is day.

head(data_wide)
     Date            MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6
1  8/16/14 12:00 AM     14     61    117    187     s1   <NA>
6  8/17/14 12:00 AM    14a    193     56    173     s2   <NA>
11 8/25/14 12:00 AM      4     21     80    187     s2   <NA>
16 8/26/14 12:00 AM      9      1    120    173     s2   <NA>
21 8/27/14 12:00 AM     13     15    100    187     s2   <NA>
26 8/28/14 12:00 AM      5     21     17    173     s2   <NA>

As you can see, there are gaps in the time series. I would like to have entries in the time series for missing days.

My approach was to use the solution provided in the link I referenced above.

Here is my implementation:

data_wide$Date<-as.POSIXct(data_wide$Date,format="%m/%d/%y")
data_wide.zoo<-zoo(data_wide[,-1],data_wide[,1]) #set date to Index
df2 <- merge(data_wide.zoo,zoo(,seq(start(data_wide.zoo),end(data_wide.zoo),by="day")), all=TRUE)
final<-as.data.frame(df2)

Initially, I thought things were perfect

head(final)
                    MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6
2014-08-16 12:00:00 14     61     117    187    s1     <NA>  
2014-08-17 12:00:00 14a    193    56     173    s2     <NA>  
2014-08-18 12:00:00 <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
2014-08-19 12:00:00 <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
2014-08-20 12:00:00 <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
2014-08-21 12:00:00 <NA>   <NA>   <NA>   <NA>   <NA>   <NA>

Looking good so far.

But then I get to a place in the time series where the days are duplicated:

106 2014-11-18 12:00:00 242a    272 246 238b    s2  NA
107 2014-11-19 11:00:00 NA  NA  NA  NA  NA  NA
108 2014-11-19 12:00:00 242b    246 251 238c    s2  NA
109 2014-11-20 11:00:00 NA  NA  NA  NA  NA  NA
110 2014-11-20 12:00:00 242c    249 269 s2  NA  NA
111 2014-11-21 11:00:00 NA  NA  NA  NA  NA  NA
112 2014-11-21 12:00:00 242d    245 249 s2  NA  NA
113 2014-11-22 11:00:00 NA  NA  NA  NA  NA  NA
114 2014-11-22 12:00:00 241a    247 235a    238d    s2  NA

Some days are duplicated, and I have no reason to think that these days should be duplicated whereas the other days are not duplicated. Does anyone have suggestions as to how to resolve this? I am not sure where the issue is coming from.

[EDIT]: I can essentially correct the dataframe by removing rows where the time is 11:00. Removing rows with NA would remove all the hard work of filling in the time series, so this is not a possibility. However, I am concerned that this is not generalizable to all of the dataframes that I will be completing time series for (i.e. will all of them produce rows where time is 11:00 on the merge that can be excluded???)

final2<-final[ grepl( "12:00" , rownames(final) ) , ]

try changing the by argument in your seq() to by="DSTday". If you sequence over a range of dates sometimes the data accounts for the time change in the spring or fall `df2 <- merge(data_wide.zoo,zoo(,seq(start(data_wide.zoo),end(data_wide.zoo),by="DSTday")), all=TRUE)` — s_scolary, Nov 18 '15 at 21:33

R: Inserting missing dates in time series duplicates some days on merge, but not others

0 Answers0