1

I have the following time series with hourly values:

str(ts_GM)
# An 'xts' object on 2016-07-29 01:00:00/2017-09-01 containing:
#   Data: num [1:7348, 1] 0 0 0 0 NA NA NA NA NA NA ...
#   Indexed by objects of class: [POSIXct,POSIXt] TZ: UTC
#   xts Attributes:  
#  NULL
head(ts_GM)
#                     [,1]
# 2016-07-29 01:00:00    0
# 2016-07-29 02:00:00    0
# 2016-07-29 03:00:00    0
# 2016-07-29 04:00:00    0
# 2016-07-29 06:00:00   NA
# 2016-07-29 07:00:00   NA
tail(ts_GM)
#                     [,1]
# 2017-08-31 19:00:00    0
# 2017-08-31 20:00:00    0
# 2017-08-31 21:00:00    0
# 2017-08-31 22:00:00    0
# 2017-08-31 23:00:00    0
# 2017-09-01 00:00:00    0

The time series have gaps because of the measurement fail, and I need compare the measured values with the maximum and minimum hourly theoretical values by each day of the year. For that reason, I need to fill the gaps with NA values in an hourly timestamp from start to end date of the time series.

I tried with:

dates_GM <- seq(from = start(ts_GM), to = end(ts_GM), by = "hour")
merge(ts_GM, dates_GM, fill = NA, all = TRUE)
# and 
merge(ts_GM, dates_GM)

But some values are duplicates because the final length of the time series is 9695 and it should be 9576. How I do it without to duplicate values?

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • I may be able to help if you can provide a minimal [reproducible example](https://stackoverflow.com/q/5963269/271616), and your desired output. – Joshua Ulrich Aug 01 '18 at 22:30

1 Answers1

1

Without being able to see the actual data, my guess would be that some of your entries in ts_GM are not on exact hour boundaries. E.g. maybe you have a "2016-07-29 05:00:01" or "2016-07-29 04:59:59" entry. Therefore when dates_GM has a "2016-07-29 05:00:00" entry it is not seen as a duplicate, so gets created as a new item.

The fix is therefore to tidyup ts_GM's index before doing the merge. (If you think that is the problem, but don't know how to fix it, add a comment - I'll go lookup some code I have that rounds off to the nearest hour.)

(I was going to also suggest that timezones might matter, but I don't think that could explain a 120 element difference; however, as a rule, do all calculations in UTC)

Darren Cook
  • 27,837
  • 13
  • 117
  • 217