0

I am struggling to find a way to aggregate a zoo object to weekly results with gaps in weekly measurements. This is to use diff and other functions (e.g. acf) on the results.

library(zoo)
library(xts)

I am creating a zoo object with a little part of my data:

time_data <- structure(list(day = structure(c(14246, 14247, 14248, 14249, 14250, 14277, 14278, 14279, 14280, 14281, 14305, 14306, 14307, 14308, 14309), class = "Date"), n_daily = c(10L, 15L, 2L, 15L, 6L, 4L, 6L, 8L, 6L, 1L, 20L, 5L, 8L, 9L, 4L)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"))

z_td <- read.zoo(time_data)

Now, I want to aggregate by week. I could use xts:

td_week_xts <- apply.weekly(z_td, sum)
td_week_xts
#> 2009-01-04 2009-01-06 2009-02-06 2009-03-06 
#>         27         21         25         46

Calling diff somehow doesn't make sense here, as there are gaps in the measurements. The results should include "empty weeks".

diff(td_week_xts)
#> 2009-01-06 2009-02-06 2009-03-06 
#>         -6          4         21

Also, apply.weekly is not very flexible when you want to define the start of the week (at least I don't see this option). And it cuts off the last week. I therefore decided to try to aggregate with my own function weekly:

weekly <- function(x, week_end = 'sunday') {
  days.of.week <- tolower(weekdays(as.Date(3,"1970-01-01",tz="GMT") + 0:6))
  index = which(days.of.week == week_end)-1
  7 * ceiling(as.numeric(x - index + 4)/7) + zoo::as.Date(index - 4)
}

td_week <- as.zooreg(aggregate(z_td, by = weekly, sum), freq= 52)

td_week
#> 2009-01-04 2009-01-11 2009-02-08 2009-03-08 
#>         27         21         25         46

Still gaps, of course, but now actually containing full weeks, and I can also define by which day the week should start. I can now make a "strictly regular" zoo object with:

td_week_strictreg <- as.zooreg(merge(td_week, zoo(, seq(min(time(td_week)), max(time(td_week)), 7)), fill = 0))
td_week_strictreg
#> 2009-01-04 2009-01-11 2009-01-18 2009-01-25 2009-02-01 2009-02-08 
#>         27         21          0          0          0         25 
#> 2009-02-15 2009-02-22 2009-03-01 2009-03-08 
#>          0          0          0         46

diff(td_week) or diff(td_week_strictreg) give the same result:

#> Data:
#> integer(0)
#> 
#> Index:
#> Date of length 0

I assume the problem lies how the time series parameter are set in the zoo/ xts objects, e.g. the frequency of the xts object is 1:

frequency(td_week_xts)
#> [1] 1
frequency(td_week)
#> [1] 52

Or it lies in the indexing: (here as an example aggregating by zoo::as.yearmon, which makes a real Index, other than my custom function...

td_month <- as.zooreg(aggregate(z_td, by = as.yearmon, sum), freq= 12)
str(td_month)
#> 'zooreg' series from Jan 2009 to Mar 2009
#>   Data: int [1:3] 48 25 46
#>   Index:  'yearmon' num [1:3] Jan 2009 Feb 2009 Mar 2009
#>   Frequency: 12

str(td_week)
#> 'zooreg' series from 2009-01-04 to 2009-03-08
#>   Data: int [1:4] 27 21 25 46
#>   Index:  Date[1:4], format: "2009-01-04" "2009-01-11" "2009-02-08" "2009-03-08"
#>   Frequency: 52

Created on 2019-04-02 by the reprex package (v0.2.1)

Apologies for the super long question, I know it's not great, but I didn't know how to be more concise.


I got a lot of help for my approach and the small function from this fabulous answer

tjebo
  • 21,977
  • 7
  • 58
  • 94

3 Answers3

3

Convert td_week to a regularly spaced series and then use diff.xts:

m <- as.xts(merge(td_week, zoo(, seq(start(td_week), end(td_week), 7)), fill = 0))
diff(m)

giving:

             x
2009-01-04  NA
2009-01-11  -6
2009-01-18 -21
2009-01-25   0
2009-02-01   0
2009-02-08  25
2009-02-15 -25
2009-02-22   0
2009-03-01   0
2009-03-08  46
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks, this works. Interesting. By the way - you answering made me realise that I haven't appropriately referenced your functions from other answers. Will update this now - thanks also for that :) – tjebo Apr 02 '19 at 22:17
  • Me again. I don't get why diff works on the `xts` object, but not on the `zooreg` object. Coredata and index are exactly the same in both. And why does diff then work on the monthly aggregated zoo data? :( Need more reading I guess. – tjebo Apr 02 '19 at 22:58
2

In principle the way you set up td_week_strictreg is the right approach (analogous to what @G.Grothendieck did for xts) but the frequency = 52 is incorrect and messes things up.

So simple things first: Simply strip off the frequency with as.zoo() and then you get the same result as for xts - except for NA padding:

td_week_zoo <- as.zoo(td_week_strictreg)
class(td_week_zoo)
## [1] "zoo"
diff(td_week_zoo)
## 2009-01-11 2009-01-18 2009-01-25 2009-02-01 2009-02-08 2009-02-15 2009-02-22 
##         -6        -21          0          0         25        -25          0 
## 2009-03-01 2009-03-08 
##          0         46 

There is nothing wrong with using zooreg instead of zoo but you need to use the correct frequency that corresponds to the underlying numeric time index. As you use a daily (not annual) time index the delta is 7 not 1/52! And the frequency is the inverse of the delta, i.e., 1/7 here:

frequency(td_week_zoo) <- 1/7
class(td_week_zoo)
## [1] "zooreg" "zoo"   
diff(td_week_zoo)
## 2009-01-11 2009-01-18 2009-01-25 2009-02-01 2009-02-08 2009-02-15 2009-02-22 
##         -6        -21          0          0         25        -25          0 
## 2009-03-01 2009-03-08 
##          0         46 

If you want to use a time index where a step of 1/52 takes you to the next week and a step of 1 takes you to the next year, you need to do so:

td_week_zooreg2 <- zooreg(coredata(td_week_zoo), start = 2009, frequency = 52)
time(td_week_zooreg2)
##  [1] 2009.000 2009.019 2009.038 2009.058 2009.077 2009.096 2009.115 2009.135
##  [9] 2009.154 2009.173
diff(td_week_zooreg2)
##  2009(2)  2009(3)  2009(4)  2009(5)  2009(6)  2009(7)  2009(8)  2009(9) 
##       -6      -21        0        0       25      -25        0        0 
## 2009(10) 
##       46 

In principle, it would also be possible to write a dedicated yearweek class where you could associate each week with a particular day in the week (say Sunday) and the corresponding date. I think the reason that nobody has written such a class (to the best of my knowledge) is that you will not always get exactly 52 Sundays in a year.

And the reason that your td_week_strictreg does not lead to an error is that zooreg just checks whether a frequency of 52 would be possible. And it is: You could have another observation every 1/52 days (approximately 27.7 minutes). And then when you take the diff() it wants to take the difference between observation and the corresponding observation 27.7 minutes earlier. But as the latter do not exist you get only NAs which are dropped resulting in an empty object.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • 1
    that's super useful. Thanks many times !!! would love to give two upvotes here. Unfortunately, I have already accepted the other answer ! – tjebo Apr 03 '19 at 08:44
  • That's alright. I think the combined answers for my co-author and myself should provide the complete picture :-) – Achim Zeileis Apr 03 '19 at 10:31
0

I'm not sure I quite understood what you're trying to do, but maybe first filling in the missing dates with zeros will work?

time_all_possibilities = data.frame(
  day = seq(ymd("2009-01-02"), ymd("2009-03-06"), by = "days"))

time_data = merge(time_data, time_all_possibilities, by = "day", all = T)
time_data$n_daily[is.na(time_data$n_daily)] = 0
Luis
  • 629
  • 4
  • 9
  • that's an interesting approach, however, it works not on my customly aggregated data, unfortunately (but it works on the xts object). (btw you forgot `library(lubridate)`;) – tjebo Apr 02 '19 at 22:27