I have a large dataset with a lot of missing values and I want to impute it by group "name" either linearly or with moving average.
d <- data.frame(
name = c('a', 'a','a','a','b','b','b','b','c','c','c','c'),
year = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
V = c(NA, 21, 31, 41, 11, NA, NA, 41, NA, NA, NA, 41),
W = c(11, NA, 31, 41, 11, 21, NA, NA, NA, NA, 31, NA),
X = c(11, 21, NA, 41, NA, 21, NA, 41, 11, NA, NA, NA),
Y = c(11, 21, 31, NA, NA, 21, 31, NA, NA, 21, NA, NA),
Z = c(NA, NA, 31, 41, 11, NA, 31, NA, NA, NA, NA, NA)
)
> d
name year V W X Y Z
1 a 1 NA 11 11 11 NA
2 a 2 21 NA 21 21 NA
3 a 3 31 31 NA 31 31
4 a 4 41 41 41 NA 41
5 b 1 11 11 NA NA 11
6 b 2 NA 21 21 21 NA
7 b 3 NA NA NA 31 31
8 b 4 41 NA 41 NA NA
9 c 1 NA NA 11 NA NA
10 c 2 NA NA NA 21 NA
11 c 3 NA 31 NA NA NA
12 c 4 41 NA NA NA NA
Hopefully the results can be as closed as the following:
name year V W X Y Z
1 a 1 11 11 11 11 11
2 a 2 21 21 21 21 21
3 a 3 31 31 31 31 31
4 a 4 41 41 41 41 41
5 b 1 11 11 11 11 11
6 b 2 21 21 21 21 21
7 b 3 31 31 31 31 31
8 b 4 41 41 41 41 41
9 c 1 11 11 11 11 NA
10 c 2 21 21 21 21 NA
11 c 3 31 31 31 31 NA
12 c 4 41 41 41 41 NA
I found this and this. Tried the following without groupby but it didn't work:
data.frame(lapply(d, function(X) approxfun(seq_along(X), X)(seq_along(X))))
imputeTS::na_ma(d, k = 2, weighting = "simple")
The first one gave an error as below:
Error in approxfun(seq_along(X), X) :
need at least two non-NA values to interpolate
In addition: Warning message:
In xy.coords(x, y, setLab = FALSE) :
Error in approxfun(seq_along(X), X) :
need at least two non-NA values to interpolate
So I tried the second one and it keep loading for a long time and nothing happened. According to the reply from the first link,
the package requires time series/vector input (that's why each column has to be called separately).
Any help is greatly appreciated!