1

I need to replace NAs with the mean of previous three values, by group. Once an NA is replaced, it will serve as input for computing the mean corresponding to the next NA (if next NA is within the next three months).

Here it is an example:

id   date   value
1 2017-04-01 40
1 2017-05-01 40
1 2017-06-01 10
1 2017-07-01 NA
1 2017-08-01 NA
2 2014-01-01 27
2 2014-02-01 13

Data:

dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), date = structure(c(17257, 17287, 17318, 17348, 17379, 16071, 16102), class = "Date"), value = c(40, 40, 10, NA, NA, 27, 13)), row.names = c(1L, 2L, 3L, 4L, 5L, 8L, 9L), class = "data.frame")

The output should look like:

id   date   value
1 2017-04-01 40.00
1 2017-05-01 40.00
1 2017-06-01 10.00
1 2017-07-01 30.00
1 2017-08-01 26.66
2 2014-01-01 27.00
2 2014-02-01 13.00

where 26.66 = (30 + 10 + 40)/3

What is an efficient way to do this (i.e. to avoid for loops)?

NRLP
  • 568
  • 3
  • 16

2 Answers2

2

The following uses base R only and does what you need.

sp <- split(dt, dt$id)
sp <- lapply(sp, function(DF){
  for(i in which(is.na(DF$value))){
    tmp <- DF[seq_len(i - 1), ]
    DF$value[i] <- mean(tail(tmp$value, 3))
  }
  DF
})

result <- do.call(rbind, sp)
row.names(result) <- NULL

result
#  id       date    value
#1  1 2017-01-04 40.00000
#2  1 2017-01-05 40.00000
#3  1 2017-01-06 10.00000
#4  1 2017-01-07 30.00000
#5  1 2017-01-08 26.66667
#6  2 2014-01-01 27.00000
#7  2 2014-01-02 13.00000
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Define a roll function which takes 3 or less previous values as a list and the current value and returns as a list the previous 2 values with the current value if the current value is not NA and the prevous 2 values with the mean if the current value is NA. Use that with Reduce and pick off the last value of each list in the result. Then apply all that to each group using ave.

roll <- function(prev, cur) {
  prev <- unlist(prev)
  list(tail(prev, 2), if (is.na(cur)) mean(prev) else cur)
}

reduce_roll <- function(x) {
  sapply(Reduce(roll, init = x[1], x[-1], acc = TRUE), tail, 1)
}

transform(dt, value = ave(value, id, FUN = reduce_roll))

giving:

  id       date    value
1  1 2017-04-01       40
2  1 2017-05-01       40
3  1 2017-06-01       10
4  1 2017-07-01       30
5  1 2017-08-01 26.66667
8  2 2014-01-01       27
9  2 2014-02-01       13
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Neat answer, thanks. Also, I corrected the cut off data (it was a copy-paste mistake which I didn't notice). – NRLP Nov 22 '18 at 19:04