Given some data like the following:
set.seed(1234)
df <- tibble(class = rep(c("a","b"), each=6), value = c(rnorm(n=6, mean=0, sd=1), rnorm(n=6, mean=1, sd=0.1)))
# A tibble: 12 x 2
# class value
# <chr> <dbl>
# 1 a -1.21
# 2 a 0.277
# 3 a 1.08
# 4 a -2.35
# 5 a 0.429
# 6 a 0.506
# 7 b 0.943
# 8 b 0.945
# 9 b 0.944
#10 b 0.911
#11 b 0.952
#12 b 0.900
I'm trying to generate a new column (context) that contains the average of "value" of the X preceding and posterior rows, when possible. It would be desirable to have this by level of a factor in a different column. For example, for X=2, I would expect something like the following:
# A tibble: 12 x 2
# class value context
# <chr> <dbl> <dbl>
# 1 a -1.21 NA
# 2 a 0.277 NA
# 3 a 1.08 -0.7135
# 4 a -2.35 0.573
# 5 a 0.429 NA
# 6 a 0.506 NA
# 7 b 0.943 NA
# 8 b 0.945 NA
# 9 b 0.944 0.9377
#10 b 0.911 0.9278
#11 b 0.952 NA
#12 b 0.900 NA
Note that for the first two rows it is not possible to generate the context value in this case, because they do not have X=2 predecing rows. The value -0.7135 at row 3 is the average of rows 1, 2, 4 and 5.
Similarly, rows 5 and 6 do not have a value of context, because these do not have two values afterwards belonging to the same level of the factor "class" (because row 7 is class="b" while 5 and 6 are class="a").
I do not know if this is even possible in R, I haven't found any similar questions, and I can only reach to solutions like the following one, which I think is not representative of this language.
My solution:
X <- 2
df_list <- df %>% dplyr::group_split(class)
result <- tibble()
for (i in 1:length(df_list)) {
tmp <- df_list[[i]]
context <- vector()
for (j in 1:nrow(tmp)) {
if (j<=X | j>nrow(tmp)-X) context <- c(context, NA)
else {
values <- vector()
for (k in 1:X) {
values <- c(values, tmp$value[j-k], tmp$value[j+k])
}
context <- c(context, mean(values))
}
}
tmp <- tmp %>% dplyr::mutate(context=context)
result <- result %>% dplyr::bind_rows(tmp)
}
This will give and approximate solution to that above (differences due to rounding). But again, this approach lacks of flexibility, e.g. if we want to create various columns at once, for different values of X. Are there R functions developed to solved tasks like this one? (eg. vectorized functions?)