I am struggling to find a way to aggregate a zoo object to weekly results with gaps in weekly measurements. This is to use diff
and other functions (e.g. acf
) on the results.
library(zoo)
library(xts)
I am creating a zoo object with a little part of my data:
time_data <- structure(list(day = structure(c(14246, 14247, 14248, 14249, 14250, 14277, 14278, 14279, 14280, 14281, 14305, 14306, 14307, 14308, 14309), class = "Date"), n_daily = c(10L, 15L, 2L, 15L, 6L, 4L, 6L, 8L, 6L, 1L, 20L, 5L, 8L, 9L, 4L)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"))
z_td <- read.zoo(time_data)
Now, I want to aggregate by week. I could use xts
:
td_week_xts <- apply.weekly(z_td, sum)
td_week_xts
#> 2009-01-04 2009-01-06 2009-02-06 2009-03-06
#> 27 21 25 46
Calling diff
somehow doesn't make sense here, as there are gaps in the measurements. The results should include "empty weeks".
diff(td_week_xts)
#> 2009-01-06 2009-02-06 2009-03-06
#> -6 4 21
Also, apply.weekly
is not very flexible when you want to define the start of the week (at least I don't see this option). And it cuts off the last week. I therefore decided to try to aggregate with my own function weekly
:
weekly <- function(x, week_end = 'sunday') {
days.of.week <- tolower(weekdays(as.Date(3,"1970-01-01",tz="GMT") + 0:6))
index = which(days.of.week == week_end)-1
7 * ceiling(as.numeric(x - index + 4)/7) + zoo::as.Date(index - 4)
}
td_week <- as.zooreg(aggregate(z_td, by = weekly, sum), freq= 52)
td_week
#> 2009-01-04 2009-01-11 2009-02-08 2009-03-08
#> 27 21 25 46
Still gaps, of course, but now actually containing full weeks, and I can also define by which day the week should start. I can now make a "strictly regular" zoo object with:
td_week_strictreg <- as.zooreg(merge(td_week, zoo(, seq(min(time(td_week)), max(time(td_week)), 7)), fill = 0))
td_week_strictreg
#> 2009-01-04 2009-01-11 2009-01-18 2009-01-25 2009-02-01 2009-02-08
#> 27 21 0 0 0 25
#> 2009-02-15 2009-02-22 2009-03-01 2009-03-08
#> 0 0 0 46
diff(td_week)
or diff(td_week_strictreg)
give the same result:
#> Data:
#> integer(0)
#>
#> Index:
#> Date of length 0
I assume the problem lies how the time series parameter are set in the zoo/ xts objects, e.g. the frequency of the xts
object is 1:
frequency(td_week_xts)
#> [1] 1
frequency(td_week)
#> [1] 52
Or it lies in the indexing: (here as an example aggregating by zoo::as.yearmon
, which makes a real Index, other than my custom function...
td_month <- as.zooreg(aggregate(z_td, by = as.yearmon, sum), freq= 12)
str(td_month)
#> 'zooreg' series from Jan 2009 to Mar 2009
#> Data: int [1:3] 48 25 46
#> Index: 'yearmon' num [1:3] Jan 2009 Feb 2009 Mar 2009
#> Frequency: 12
str(td_week)
#> 'zooreg' series from 2009-01-04 to 2009-03-08
#> Data: int [1:4] 27 21 25 46
#> Index: Date[1:4], format: "2009-01-04" "2009-01-11" "2009-02-08" "2009-03-08"
#> Frequency: 52
Created on 2019-04-02 by the reprex package (v0.2.1)
Apologies for the super long question, I know it's not great, but I didn't know how to be more concise.
I got a lot of help for my approach and the small function from this fabulous answer