1

I would like to find a dplyr way to take average for the next 3 rows. Say I have a data frame:

data <- structure(list(x = 1:6, y = c(32.1056789265246, 3.48493686329687, 8.21300282100191, 6.72266588891445, 27.7353607044612, 18.5963631547696)), .Names = c("x", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

A tibble: 6 × 2
      x          y
  <int>      <dbl>
1     1 12.8230546
2     2  3.4083329
3     3  0.4825815
4     4 13.6714485
5     5  8.9829427
6     6  2.5997503

I want to generate a new data frame that has 3 rows with first one the average from row 2,3,4 and next from 3,4,5 and last one from 4,5,6.

A for loop is probably the easiest way but I would appreciate if there is some more elegant dplyr way to go...Thanks!

Pierre L
  • 28,203
  • 6
  • 47
  • 69
advance
  • 83
  • 8
  • 1
    Sounds like you are looking for a rolling mean function. Did you search for it? You could look at the package `zoo` – talat Sep 06 '16 at 14:18
  • Your title and description say next 'columns', did you mean 'rows'? – Pierre L Sep 06 '16 at 14:19
  • 1
    Also, some variation of `with(data, filter(y, rep(1/3, 3)))` in base R probably. Or in data.table- ```library(data.table) ; setDT(data)[, Reduce(`+`, shift(y, 0:2))/3]```. And this is an obvious dupe btw (too all the shameless answerers) – David Arenburg Sep 06 '16 at 14:31
  • 2
    You can also try http://stackoverflow.com/questions/25809195/rollmean-with-dplyr-and-magrittr for more – Pierre L Sep 06 '16 at 14:38
  • Also http://stackoverflow.com/questions/743812/calculating-moving-average-in-r – David Arenburg Sep 06 '16 at 14:41

3 Answers3

2

You can use the rollmean() function from zoo package with lapply to loop through columns, remove the first row if you don't need it:

library(zoo)
as.data.frame(lapply(data, rollmean, 3))

#  x         y
#1 2 14.601206
#2 3  6.140202
#3 4 14.223676
#4 5 17.684797

If you don't need the first row:

as.data.frame(lapply(data[-1,], rollmean, 3))
#  x         y
#1 3  6.140202
#2 4 14.223676
#3 5 17.684797
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

You can use the RcppRoll package to do that as follows:

require(RcppRoll)
roll_mean(data$y[-1], 3) ## 6.140202 14.223676 17.684797

As i am note sure what output you are looking for you could do:

require(dplyr)
data %>% 
  mutate(rmean = roll_meanl(y, 3)) %>% 
  filter(between(x, 2, 4)) %>% 
  select(-y)

Which results in:

# A tibble: 3 × 2
      x     rmean
  <int>     <dbl>
1     2  6.140202
2     3 14.223676
3     4 17.684797
Rentrop
  • 20,979
  • 10
  • 72
  • 100
0

Given that you asked specifically about dplyr, you could try this:

library(dplyr)

data %>% 
  mutate(av3 = (lead(y, n=1L) + lead(y, n=2L) + lead(y, n=3L))/3)

Which creates:

# A tibble: 6 × 3
      x         y       av3
  <int>     <dbl>     <dbl>
1     1 32.105679  6.140202
2     2  3.484937 14.223676
3     3  8.213003 17.684797
4     4  6.722666        NA
5     5 27.735361        NA
6     6 18.596363        NA
p0bs
  • 1,004
  • 2
  • 15
  • 22