1

I've encountered this problem several times over the years so maybe I'm just misunderstanding something or somehow just being silly about this. I've met a wierd problem when doing a rolling average on irregular time series. A good overview of the available methods in packages and simple script is here: Calculating moving average in R

I may be misreading some of them, but I see an issue in dealing with irregular time series. For example the common method of Rolling means in the zoo package requires unique values for each data. But in this case this is not the rolling average, but a rolling average of averages per time unit - time units will less data points will have comparably more influence on the average than ones with more.

A true moving average seems to me to need to work not with aggregates, but with distributions for each calculated point.

Given that I have the following data frame or irregular values, how can I best create a moving average measure for each of the values.

df <- data.frame(year = c(rep(2000,3),rep(2001,1),rep(2004,4),rep(2005,3),+
      rep(2006,3),rep(2007,1),rep(2008,2),rep(2009,6),rep(2010,8)),+
      value1=rnorm(31), value2=rnorm(31), value3=rnorm(31))

I found an easy way to do it via subsetting that I'll post as an initial answer, but this works in limited circumstances and needs to be customized each time. I'm wondering what is a general solution. Also, if anyone is able to comment on the practices of using averages of averages vs averages of distributions in rolling means calculations, that would be extra helpful. Thanks!

jangorecki
  • 16,384
  • 4
  • 79
  • 160
puslet88
  • 1,288
  • 15
  • 25

2 Answers2

2

Calcuations within a sliding or rolling window of an irregular time series can be solved by 's ability to aggregate in a non-equi join.

There are many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable) or Rolling regression on irregular time series.

However, this question is different and thus deserves an answer on its own. From OP's own answer it can be concluded that the OP is looking for a centred rolling window. In addition, the rolling mean is to be computed for several columns.

library(data.table)
cols <- c("value2", "value3")
setDT(df)[SJ(year = (min(year) + 2):(max(year) - 2))[, c("start", "end") := .(year - 2, year + 2)],
   on = .(year >= start, year < end),
   c(.(year = i.year), lapply(.SD, mean)), .SDcols = cols, by = .EACHI][, -(1:2)]
   year      value2      value3
1: 2002  0.57494219 -0.53001134
2: 2003  0.33925292  0.75541896
3: 2004 -0.05834453  0.23987209
4: 2005  0.17031099  0.13074666
5: 2006  0.05272739  0.09297215
6: 2007 -0.12935805 -0.38780964
7: 2008  0.19716437 -0.11587017

The result is identical to OP's own result rmeans.

Data

set.seed(123)   # ensure reproducible sample data
df <- data.frame(
  year = rep(2000:2010, c(3, 1, 0, 0, 4, 3, 3, 1, 2, 6, 8)),
  value1 = rnorm(31), value2 = rnorm(31), value3 = rnorm(31))
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • 1
    This question seems ancient already, but thanks a lot! This looks very efficient. :) I've also moved to data.table since then, and have used a similar but less compact strategy. – puslet88 Mar 04 '19 at 22:55
0

So here is the simple subsetting I came up with. Could be helpful if anyone finds themselves finding the same issues:

df <- data.frame(year = c(rep(2000,3),rep(2001,1),rep(2004,4),rep(2005,3), +
rep(2006,3),rep(2007,1),rep(2008,2),rep(2009,6),rep(2010,8)), +
value1=rnorm(31), value2=rnorm(31), value3=rnorm(31))

rmeans <- data.frame()
for (i in (min(df$year)+2):(max(df$year)-2)){
  rmeans <- rbind(rmeans, data.frame(year=i,as.data.frame.list(colMeans(df +
[df$year>=(i-2)&df$year<(i+2),-c(1,2)]))))
}
puslet88
  • 1,288
  • 15
  • 25