Reshape a dataframe into stacks of average values

Question

I've gotten my hands on some data I need to transform i R. The data looks like this:

df <- data.frame(time = 1:100, value = runif(100, min = -20, max = 20))

What I would like to do, is transform the data to a matrix containing running means, up to 5 time periods ahead. It's hard to explain, but an example would be like this.

Original Data

And the result would be this matrix/data frame.

time  mean-value(5)      mean-value(4)    mean-value(3)   mean-value(2)    Mean-value(1)
1     (2+7+8+19-5)/5     (2+7+8+19)/4     (2+7+8)/3       (2+7)/2          2/1
2     (7+8+19-5-15)/5    (7+8+19-5)/4     (7+8+19)/3      (7+8)/2          7/1
3     (8+19-5-15+4)/5    .....
....
....
96    na                 numbers/4         numbers/3      numbers/2        numbers/1
97    na                 na                numbers/3       .....

Im am at a complete loss, I've tried some reshaping, but it doesn't get right. In the end it should also just give NA's if there is not enough time ahead observations to calculate.

HAve you looked at this answer? http://stats.stackexchange.com/questions/3051/mean-of-a-sliding-window-in-r — infominer, Apr 12 '14 at 18:23

Arun · Answer 1 · 2014-04-12T18:56:13.483

Here's one way using data.table. There may very well be improvements to this answer or even better answers entirely.

Get the data.table:

require(data.table) ## >= 1.9.2
dat <- read.table(header=TRUE, text="time value
         1     2
         2     7 
         3     8
         4    19
         5    -5
         6   -15
         7     4 
         8     6
         9    12
        10    20")

# convert to `data.table` by reference:
setDT(dat)

Generate all means:

N = 5L
grp = seq_len(N);
ans = dat[, { 
              ix = .I:(.I+N-1L);
              vx = cumsum(dat$value[ix]);
              list(grp=grp, val=rev(vx/grp))
            }, by=time]

Check ?data.table to read about .I (which is a special variable that contains the row number of dat corresponding to each group).

Cast it to wide format:

dcast.data.table(ans, time ~ grp, value.var="val")

    time   1     2          3     4   5
 1:    1 6.2  9.00  5.6666667   4.5   2
 2:    2 2.8  7.25 11.3333333   7.5   7
 3:    3 2.2  1.75  7.3333333  13.5   8
 4:    4 1.8  0.75 -0.3333333   7.0  19
 5:    5 0.4 -2.50 -5.3333333 -10.0  -5
 6:    6 5.4  1.75 -1.6666667  -5.5 -15
 7:    7  NA 10.50  7.3333333   5.0   4
 8:    8  NA    NA 12.6666667   9.0   6
 9:    9  NA    NA         NA  16.0  12
10:   10  NA    NA         NA    NA  20

score 2 · Accepted Answer · edited May 23 '17 at 12:27

Adapting the answer here, you can get what you want pretty easily using filter:

sapply(5:1, function(z) rev(filter(rev(df$value), rep(1/z,z), sides=1)))

Here's the result on your example data:

      [,1]  [,2]       [,3]  [,4] [,5]
 [1,]  6.2  9.00  5.6666667   4.5    2
 [2,]  2.8  7.25 11.3333333   7.5    7
 [3,]  2.2  1.75  7.3333333  13.5    8
 [4,]  1.8  0.75 -0.3333333   7.0   19
 [5,]  0.4 -2.50 -5.3333333 -10.0   -5
 [6,]  5.4  1.75 -1.6666667  -5.5  -15
 [7,]   NA 10.50  7.3333333   5.0    4
 [8,]   NA    NA 12.6666667   9.0    6
 [9,]   NA    NA         NA  16.0   12
[10,]   NA    NA         NA    NA   20

Reshape a dataframe into stacks of average values

2 Answers2