0

First, simple R example for my problem:

> df
   x  y
1  1  3
2  2  7
3  4  9
4  8  0
5  3  1
6 12 24

I'd like to compute for each column the mean of three subsequent row data for each row, which results into the following dataframe.

> dfRes
      xRes     yRes
         x        y
1 2.333333 6.333333
2 4.666667 5.333333
3 5.000000 3.333333
4 7.666667 8.333333
5       NA       NA
6       NA       NA

Since I've got very big dataframe with many columns and rows, I'd like to avoid using a for loop for this computation. I've tried defining custom function to use the available sapply function.

Does somebody know an simple solution in R with a relative fast computation time for this problem?

---- Update ---- The calculation procedure should be something like:

xRes[1] = mean(x[1] + x[2] + x[3])
xRes[2] = mean(x[2] + x[3] + x[4])
...
xRes[5] = NA # because there is no x[7]
Liv-Con
  • 21
  • 5
  • How exactly did you compute the results in the sample dataset? – tmfmnk Dec 30 '20 at 13:29
  • Does this answer your question? [Calculate the mean of every 13 rows in data frame](https://stackoverflow.com/questions/30359427/calculate-the-mean-of-every-13-rows-in-data-frame) – tjebo Dec 30 '20 at 13:34
  • How is the first value 3.5 or second value 14? – Ronak Shah Dec 30 '20 at 13:35
  • For example `x[1] + x[2] + x[3] = xRes[1]`, `x[2] + x[3] + x[4] = xRes[2]` etc.. For x[5] it isn't possible to calculate the mean, because we are missing the 7th row value. I'm updating my question to make this more clearly. – Liv-Con Dec 30 '20 at 13:37
  • In general you can get the previous values in a column by lag(x, i) with i denoting how far back you go. So (lag(x) + lag(x,2) + lag(x,3))/3 will give you the mean you want. You don't even need to define a new function for this. Just use dplyr and use mutate across all columns. – Mario Niepel Dec 30 '20 at 13:37
  • That's right. There `dfRes` results were wrong. I have updated the question. – Liv-Con Dec 30 '20 at 13:48
  • The function provided by Ronak Shah is exactly what I was looking for. Thank you all for your comment. – Liv-Con Dec 30 '20 at 13:58

5 Answers5

1

you can use rolling functions with an appropriate window size.

df[] <- lapply(df, zoo::rollmean, 3, fill = NA, align = 'left')

If your data is huge using data.table might help.

library(data.table)
setDT(df)[, lapply(.SD, frollmean, 3, align = 'left')]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

how about map function? https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/map its actually a really good alternative to for-loop

0

I'd transform the data frame in a matrix and use colMeans with the nrow argument. The solution is basically copied from this answer here: https://stackoverflow.com/a/25151007/7941188

# devtools::install_github("alistaire47/read.so")
foo <- read.so::read.so(
"
   x  y
1  1  3
2  2  7
3  4  9
4  8  0
5  3  1
6 12 24")

foo_mat <- as.matrix(foo)

sapply(foo, function(x) colMeans(matrix(x, nrow = 3)))
#>             x        y
#> [1,] 2.333333 6.333333
#> [2,] 7.666667 8.333333

This depends on length(matrix) being a multiple of (3). In the other post I linked to there is a solution which apparently deals with other cases

tjebo
  • 21,977
  • 7
  • 58
  • 94
0

Using filter based on the method of this historic answer.

n <- 3
df1[] <- lapply(df1, function(x) c(na.omit(filter(x, rep(1 / n, n))), rep(NA, n - 1)))
df1  
#          x        y
# 1 2.333333 6.333333
# 2 4.666667 5.333333
# 3 5.000000 3.333333
# 4 7.666667 8.333333
# 5       NA       NA
# 6       NA       NA

Data:

df1 <- structure(list(x = c(2.33333333333333, 4.66666666666667, 5, 7.66666666666667, 
NA, NA), y = c(6.33333333333333, 5.33333333333333, 3.33333333333333, 
8.33333333333333, NA, NA)), row.names = c("1", "2", "3", "4", 
"5", "6"), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

Using lag , lead and mutate from dplyr

df%>%
mutate(x=(lag(x,0)+lead(x)+lead(x,2))/3 , y=(lag(y,0)+lead(y)+lead(y,2))/3)
kaix
  • 305
  • 3
  • 10