How to replace NAs with the linear interpolation between known observations?

Question

I have the following data frame,

df <- data.frame(id = c("a", "a", "a", "a", "b", "b", "b", "b"),
        time = 1:4, value = c(100, NA, NA, 550, 300, NA, NA, 900))

Can someone suggest an approach for replacing the NA values in df by dividing the difference of the value column evenly over time? At time 1, A is 100 and at time 4 A is 550. How would one change the NAs in times 2 and 3 to 250 and 400? And then 500 and 700 for B at times 2 and 3?

I can write a complex for loop to brute force it, but is there a more efficient solution?

score 12 · Accepted Answer · answered Mar 17 '15 at 17:41

12

You could use na.approx from zoo

library(zoo)
df$value <- na.approx(df$value)
df
#  id time value
#1  a    1   100
#2  a    2   250
#3  a    3   400
#4  a    4   550
#5  b    1   300
#6  b    2   500
#7  b    3   700
#8  b    4   900

answered Mar 17 '15 at 17:41

akrun

874,273
37
540
662

score 7 · Answer 2 · answered Mar 17 '15 at 18:11

7

Or you can create your own vectorized version of na.approx without any complicated loops and solve it without any external packages

myna.approx <- function(x){
  len <- length(x) 
  cumsum(c(x[1L], rep((x[len] - x[1L])/(len - 1L), len - 1L)))
}

with(df, ave(value, id, FUN = myna.approx))
## [1] 100 250 400 550 300 500 700 900

answered Mar 17 '15 at 18:11

David Arenburg

91,361
17
137
196

1

I accepted the zoo solution because I think in general people will be looking for pre-existing functions even though I learned more from your answer. – Jake Russ Mar 17 '15 at 19:46
3

That's fine, I would use `na.approx` too. All I wanted to illustrate is that in R you should try to think vectorized and that 95% of day-to-day tasks can be solved without writing a single loop- no matter how hard the task seems at first glance. – David Arenburg Mar 17 '15 at 20:31

How to replace NAs with the linear interpolation between known observations?

2 Answers2

Linked

Related