2

I've gotten my hands on some data I need to transform i R. The data looks like this:

df <- data.frame(time = 1:100, value = runif(100, min = -20, max = 20))

What I would like to do, is transform the data to a matrix containing running means, up to 5 time periods ahead. It's hard to explain, but an example would be like this.

Original Data

time value
1      2
2      7 
3      8
4     19
5     -5
6    -15
7     4 
8     6
9     12
10    20

And the result would be this matrix/data frame.

time  mean-value(5)      mean-value(4)    mean-value(3)   mean-value(2)    Mean-value(1)
1     (2+7+8+19-5)/5     (2+7+8+19)/4     (2+7+8)/3       (2+7)/2          2/1
2     (7+8+19-5-15)/5    (7+8+19-5)/4     (7+8+19)/3      (7+8)/2          7/1
3     (8+19-5-15+4)/5    .....
....
....
96    na                 numbers/4         numbers/3      numbers/2        numbers/1
97    na                 na                numbers/3       .....                    

Im am at a complete loss, I've tried some reshaping, but it doesn't get right. In the end it should also just give NA's if there is not enough time ahead observations to calculate.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Thorst
  • 1,590
  • 1
  • 21
  • 35
  • HAve you looked at this answer? http://stats.stackexchange.com/questions/3051/mean-of-a-sliding-window-in-r – infominer Apr 12 '14 at 18:23

2 Answers2

3

Here's one way using data.table. There may very well be improvements to this answer or even better answers entirely.

Get the data.table:

require(data.table) ## >= 1.9.2
dat <- read.table(header=TRUE, text="time value
         1     2
         2     7 
         3     8
         4    19
         5    -5
         6   -15
         7     4 
         8     6
         9    12
        10    20")

# convert to `data.table` by reference:
setDT(dat)

Generate all means:

N = 5L
grp = seq_len(N);
ans = dat[, { 
              ix = .I:(.I+N-1L);
              vx = cumsum(dat$value[ix]);
              list(grp=grp, val=rev(vx/grp))
            }, by=time]

Check ?data.table to read about .I (which is a special variable that contains the row number of dat corresponding to each group).

Cast it to wide format:

dcast.data.table(ans, time ~ grp, value.var="val")

    time   1     2          3     4   5
 1:    1 6.2  9.00  5.6666667   4.5   2
 2:    2 2.8  7.25 11.3333333   7.5   7
 3:    3 2.2  1.75  7.3333333  13.5   8
 4:    4 1.8  0.75 -0.3333333   7.0  19
 5:    5 0.4 -2.50 -5.3333333 -10.0  -5
 6:    6 5.4  1.75 -1.6666667  -5.5 -15
 7:    7  NA 10.50  7.3333333   5.0   4
 8:    8  NA    NA 12.6666667   9.0   6
 9:    9  NA    NA         NA  16.0  12
10:   10  NA    NA         NA    NA  20
Arun
  • 116,683
  • 26
  • 284
  • 387
2

Adapting the answer here, you can get what you want pretty easily using filter:

sapply(5:1, function(z) rev(filter(rev(df$value), rep(1/z,z), sides=1)))

Here's the result on your example data:

      [,1]  [,2]       [,3]  [,4] [,5]
 [1,]  6.2  9.00  5.6666667   4.5    2
 [2,]  2.8  7.25 11.3333333   7.5    7
 [3,]  2.2  1.75  7.3333333  13.5    8
 [4,]  1.8  0.75 -0.3333333   7.0   19
 [5,]  0.4 -2.50 -5.3333333 -10.0   -5
 [6,]  5.4  1.75 -1.6666667  -5.5  -15
 [7,]   NA 10.50  7.3333333   5.0    4
 [8,]   NA    NA 12.6666667   9.0    6
 [9,]   NA    NA         NA  16.0   12
[10,]   NA    NA         NA    NA   20
Community
  • 1
  • 1
Thomas
  • 43,637
  • 12
  • 109
  • 140