R: Impute missing data with mean of first previous and latter non missing data

Question

Assume the data look like:

df <- data.frame(ID=1:6, Value=c(NA, 1, NA, NA, 2, NA))
df
  ID Value
1  1    NA
2  2     1
3  3    NA
4  4    NA
5  5     2
6  6    NA

And I want the imputed result be like:

More specific, I want to impute missing data with mean of first previous and latter non missing data, if only one of previous or latter non missing data exist, impute with this non missing data. Behavior for all data are missing is not defined.

How can I do that in R?

This seems to be what you're looking for: http://stackoverflow.com/questions/15308205/mean-before-after-imputation-in-r — Frank, Jun 19 '15 at 18:09
imputeTS::interpolation and zoo::approx might be worth a look, to get a solution similar to the requested one ( not 100% the requested result indeed) — Steffen Moritz, Dec 07 '17 at 14:26

IRTFM · Answer 1 · 2015-06-19T18:51:16.180

Take a look at the design of approxfun with rule=2. This isn't exactly what you asked for (since it does a linear interpolation across the NA gaps rather than substituting the mean of the gap endpoints), but it might be acceptable:

> approxfun(df$ID, df$Value, rule=2)(df$ID)
[1] 1.000000 1.000000 1.333333 1.666667 2.000000 2.000000

With rule=2 it does behave as you desired at the extremes. There are also na.approx methods in the zoo-package.

I would caution against using such data for any further statistical inference. This method of imputation is essentially saying there is no possibility of random variation during periods of no measurement, and the world is generally not so consistent.

G. Grothendieck · Accepted Answer · 2015-06-19T20:48:15.973

1

Use na.locf both forwards and backwards and take their average:

library(zoo)

both <- cbind( na.locf(df$Value, na.rm = FALSE), 
               na.locf(df$Value, na.rm = FALSE, fromLast = TRUE))
transform(df, Value = rowMeans(both, na.rm = TRUE))

giving:

edited Jun 19 '15 at 20:48

answered Jun 19 '15 at 20:15

G. Grothendieck

254,981
17
203
341

score 0 · Answer 3 · answered Jun 19 '15 at 17:44

This should work.

for( i in 1:nrow(df)){
    if(is.na(df$Value[i])){
        df$Value[i] <- mean(df$Value[1:i])
    }
}

I don't know if this is exactly what you want. I didn't understand your statement. "I want to impute missing data with mean of first previous and latter non missing data, if only one of previous or latter non missing data exist, impute with this non missing data"

What values do you want to find the mean of to replace the NAs?

R: Impute missing data with mean of first previous and latter non missing data

3 Answers3

Linked