0
df <- structure(
  list(
    inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1", "INV_2", "INV_2", "INV_2", "INV_2", "INV_2", "INV_2"),
    ass = c("x", "x", "x", "y", "y", "x", "x", "x", "t", "t", "t"),
    datetime = c("2010-01-01", "2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19", "2010-02-20", "2010-02-22", "2010-02-23", "2010-03-01", "2010-03-02", "2010-03-04"),
    price = c(10, 10, 19, 9, 3 , 5, 1, 4, 4, 5, 1),
    operation = c(10, 0, 2, 2, 0, 5, 5, 5, 3, 0, 2)
  ),
  class = "data.frame", row.names = c(NA, -11L)
)

I have this data frame for which I want to compute the moving average of the "price" column.

In particular I need to apply a minor change before doing that. I want to change the "price" column value if the "operation" value on that row is equal to 0.

Hence I need R to:

df <- df %>% mutate( price = if_else(operation == 0, NA, price)

And then when price == NA to fill the value with the moving average of the price column. Since I could have consecutive NAs in the price column I think that the moving average should be applied with a rolling window.

I'm a new R user so I don't have a real idea of how to do that. Any idea? If possible I would prefer a dplyr solution

user438383
  • 5,716
  • 8
  • 28
  • 43

2 Answers2

0

Assuming that the intention is that the average of the modified price from the beginning to the current row of is to be calculated with 0 rows NA'd out so that they are eliminated from the average try this. For example, the third row should use the average of 10 and 19 which is 14.5.

The code specifies an average of n() (= number of rows) elements but partial=TRUE instructs it just to use however many there are if there are fewer. na.rm=TRUE results in NA's not being included in the average. We have excluded filter and lag from dplyr since they clobber the functions of the same names in R and so tend to lead to hard to detect errors. If you need to use them use dplyr::lag and dplyr::filter.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

df %>% 
 mutate(price = ifelse(operation == 0, NA, price),
        avg = rollapplyr(price, n(), mean, na.rm = TRUE, partial = TRUE))

This variation also works. 1:n() specifies that the i elements should be used for the ith row. Again, na.rm=TRUE will eliminate NA's from the calculation.

df %>%
  mutate(price = ifelse(operation == 0, NA, price),
         avg = rollapplyr(price, 1:n(), mean, na.rm = TRUE))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • i think the 1:n() is a too large interval, I just need to do the computation, I.E. the mean with the i-1 and i+1 rows. Suppose that after the "mutate" part in your code I end up having 3 consecutive NA rows, then R should take NA_1 and do the avg between raw i-1 and row i+3 (since the closest price is after 3 NAs), then for NA_2 should take raw i-1 and row i+2 and so on until all the NAs are completed. In the other raws the price is already computed with the "mutate" part How should I change your code? Thanks for the answer – Lorenzo Mazzucchelli May 23 '22 at 22:23
  • Do you want to take the average of offsets -1, 0 and 1 after replacing NAs with the average of the two closest non-NAs? That would `rollapply( (na.locf0(price) + na.locf0(price, fromLast=TRUE))/2, 3, mean, na.rm = TRUE, fill = NA)` or add `partial=TRUE` if that is what you want. – G. Grothendieck May 24 '22 at 13:34
0

if the intention is to mutate the price with the average prices from previous rows where the operation > 0; then here is my dplyr code.

df <- tibble(df)

  
  df %>% 
    mutate( price = ifelse( operation==0, 0 ,price)) %>% 
    mutate(runinngsumPrice = cumsum(price))  %>% 
    mutate(runinngsumNNA = cumsum(ifelse(operation==0,0,1)))  %>% 
    mutate( price = ifelse( operation==0, runinngsumPrice/runinngsumNNA ,price))  %>% 
    select(1:5)
            
# 
#   # A tibble: 11 x 5
#   inv   ass   datetime   price operation
#   <chr> <chr> <chr>      <dbl>     <dbl>
#     1 INV_1 x     2010-01-01 10           10
#   2 INV_1 x     2010-01-02 10            0
#   3 INV_1 x     2010-01-03 19            2
#   4 INV_1 y     2010-01-08  9            2
#   5 INV_1 y     2010-01-19 12.7          0
#   6 INV_2 x     2010-02-20  5            5
#   7 INV_2 x     2010-02-22  1            5
#   8 INV_2 x     2010-02-23  4            5
#   9 INV_2 t     2010-03-01  4            3
#   10 INV_2 t     2010-03-02  7.43         0
#   11 INV_2 t     2010-03-04  1            2
Melih
  • 26
  • 1