Computing for each datafram column the mean of three subsequent row data for each row (R)

Question

First, simple R example for my problem:

I'd like to compute for each column the mean of three subsequent row data for each row, which results into the following dataframe.

> dfRes
      xRes     yRes
         x        y
1 2.333333 6.333333
2 4.666667 5.333333
3 5.000000 3.333333
4 7.666667 8.333333
5       NA       NA
6       NA       NA

Since I've got very big dataframe with many columns and rows, I'd like to avoid using a for loop for this computation. I've tried defining custom function to use the available sapply function.

Does somebody know an simple solution in R with a relative fast computation time for this problem?

---- Update ---- The calculation procedure should be something like:

xRes[1] = mean(x[1] + x[2] + x[3])
xRes[2] = mean(x[2] + x[3] + x[4])
...
xRes[5] = NA # because there is no x[7]

How exactly did you compute the results in the sample dataset? — tmfmnk, Dec 30 '20 at 13:29
Does this answer your question? [Calculate the mean of every 13 rows in data frame](https://stackoverflow.com/questions/30359427/calculate-the-mean-of-every-13-rows-in-data-frame) — tjebo, Dec 30 '20 at 13:34
For example `x[1] + x[2] + x[3] = xRes[1]`, `x[2] + x[3] + x[4] = xRes[2]` etc.. For x[5] it isn't possible to calculate the mean, because we are missing the 7th row value. I'm updating my question to make this more clearly. — Liv-Con, Dec 30 '20 at 13:37
In general you can get the previous values in a column by lag(x, i) with i denoting how far back you go. So (lag(x) + lag(x,2) + lag(x,3))/3 will give you the mean you want. You don't even need to define a new function for this. Just use dplyr and use mutate across all columns. — Mario Niepel, Dec 30 '20 at 13:37
That's right. There `dfRes` results were wrong. I have updated the question. — Liv-Con, Dec 30 '20 at 13:48
The function provided by Ronak Shah is exactly what I was looking for. Thank you all for your comment. — Liv-Con, Dec 30 '20 at 13:58

score 1 · Accepted Answer · answered Dec 30 '20 at 13:43

1

you can use rolling functions with an appropriate window size.

df[] <- lapply(df, zoo::rollmean, 3, fill = NA, align = 'left')

If your data is huge using data.table might help.

library(data.table)
setDT(df)[, lapply(.SD, frollmean, 3, align = 'left')]

answered Dec 30 '20 at 13:43

Ronak Shah

377,200
20
156
213

score 0 · Answer 2 · answered Dec 30 '20 at 13:32

0

how about map function? https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/map its actually a really good alternative to for-loop

answered Dec 30 '20 at 13:32

123 bodyuilding

50
6

score 0 · Answer 3 · answered Dec 30 '20 at 13:36

I'd transform the data frame in a matrix and use colMeans with the nrow argument. The solution is basically copied from this answer here: https://stackoverflow.com/a/25151007/7941188

# devtools::install_github("alistaire47/read.so")
foo <- read.so::read.so(
"
   x  y
1  1  3
2  2  7
3  4  9
4  8  0
5  3  1
6 12 24")

foo_mat <- as.matrix(foo)

sapply(foo, function(x) colMeans(matrix(x, nrow = 3)))
#>             x        y
#> [1,] 2.333333 6.333333
#> [2,] 7.666667 8.333333

This depends on length(matrix) being a multiple of (3). In the other post I linked to there is a solution which apparently deals with other cases

score 0 · Answer 4 · answered Dec 30 '20 at 14:09

Using filter based on the method of this historic answer.

n <- 3
df1[] <- lapply(df1, function(x) c(na.omit(filter(x, rep(1 / n, n))), rep(NA, n - 1)))
df1  
#          x        y
# 1 2.333333 6.333333
# 2 4.666667 5.333333
# 3 5.000000 3.333333
# 4 7.666667 8.333333
# 5       NA       NA
# 6       NA       NA

Data:

df1 <- structure(list(x = c(2.33333333333333, 4.66666666666667, 5, 7.66666666666667, 
NA, NA), y = c(6.33333333333333, 5.33333333333333, 3.33333333333333, 
8.33333333333333, NA, NA)), row.names = c("1", "2", "3", "4", 
"5", "6"), class = "data.frame")

score 0 · Answer 5 · answered Dec 31 '20 at 02:07

0

Using lag , lead and mutate from dplyr

df%>%
mutate(x=(lag(x,0)+lead(x)+lead(x,2))/3 , y=(lag(y,0)+lead(y)+lead(y,2))/3)

answered Dec 31 '20 at 02:07

kaix

305
3
10

Computing for each datafram column the mean of three subsequent row data for each row (R)

5 Answers5