5

I have the following data frame,

df <- data.frame(id = c("a", "a", "a", "a", "b", "b", "b", "b"),
        time = 1:4, value = c(100, NA, NA, 550, 300, NA, NA, 900))

Can someone suggest an approach for replacing the NA values in df by dividing the difference of the value column evenly over time? At time 1, A is 100 and at time 4 A is 550. How would one change the NAs in times 2 and 3 to 250 and 400? And then 500 and 700 for B at times 2 and 3?

I can write a complex for loop to brute force it, but is there a more efficient solution?

Jake Russ
  • 683
  • 1
  • 9
  • 19

2 Answers2

12

You could use na.approx from zoo

library(zoo)
df$value <- na.approx(df$value)
df
#  id time value
#1  a    1   100
#2  a    2   250
#3  a    3   400
#4  a    4   550
#5  b    1   300
#6  b    2   500
#7  b    3   700
#8  b    4   900
akrun
  • 874,273
  • 37
  • 540
  • 662
7

Or you can create your own vectorized version of na.approx without any complicated loops and solve it without any external packages

myna.approx <- function(x){
  len <- length(x) 
  cumsum(c(x[1L], rep((x[len] - x[1L])/(len - 1L), len - 1L)))
}

with(df, ave(value, id, FUN = myna.approx))
## [1] 100 250 400 550 300 500 700 900
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    I accepted the zoo solution because I think in general people will be looking for pre-existing functions even though I learned more from your answer. – Jake Russ Mar 17 '15 at 19:46
  • 3
    That's fine, I would use `na.approx` too. All I wanted to illustrate is that in R you should try to think vectorized and that 95% of day-to-day tasks can be solved without writing a single loop- no matter how hard the task seems at first glance. – David Arenburg Mar 17 '15 at 20:31