Using Kalman smoothing in R's KFAS package to impute missing data

Question

I have a data frame (reproducible example at the bottom) containing a column of values representing precipitation volume, a column of date-of-measurement values, and a column each for lat, lon, and elevation coordinates. The data covers 10 years of measurement, and 10 different lat/long/elev points (levels which I will call "stations").

The precipitation column is MCAR missing 3.4% of its values. My goal is to impute the missing values, taking into account both the temporal correlation (the NA's position within its station's time series) and the spatial correlation (the NA's geographic relationship to the rest of the points.)

I do not think typical ARIMA based techniques, such as those found in Amelia or ImputeTS will satisfy, because they are limited to univariate data.

I am interested in using the KFAS package because I believe it will allow me to treat these different "stations" as "states" within the "state space", and enable me to use Kalman smoothing to "predict" the missing values based on the correlation of the both spatial and temporal variables.

My trouble is that I am having a VERY hard time getting over KFAS' learning curve and implementing this model. The documentation is sparse and there are next to no tutorials or beginner focused material out there. I'm feeling like I don't even know how to get started.

Can KFAS be used this way? How would you approach this challenge? What would the basic steps look like in KFAS?

Since I barely know how to frame this question, I have made an effort to make good reproducible data. This sample data covers three "stations" over 1 month, which I'm thinking should be sufficient for demonstration. The values are realistic but not accurate.

#defining the precip variable
set.seed(76)
precip <- sample(0:7, 30, replace=TRUE)

#defining the categorical variables 
lon1 <- (-123.7500)
lon2 <- (-124.1197)
lon3 <- (-124.0961)
lat1 <- (43.9956)
lat2 <- (44.0069)
lat3 <- (44.0272)
elev1 <- 76.2
elev2 <- 115.8
elev3 <- 3.7
date1 <- seq(as.Date('2011-01-01'), as.Date('2011-01-10'),by=1)
date2 <- seq(as.Date('2011-01-11'), as.Date('2011-01-20'),by=1)
date3 <- seq(as.Date('2011-01-21'), as.Date('2011-01-30'),by=1) 

#creating the df
reprex.data <- NULL
reprex.data$precip <- precip

#inserting NA's randomly into the precip vector now to easily avoid doing it to the other variables 
reprex.data <- as.data.frame(lapply(reprex.data, function(cc) cc[sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE)]))

#creating the rest of the df 
reprex.data$lon[1:10] <- lon1
reprex.data$lon[11:20] <- lon2
reprex.data$lon[21:30] <- lon3
reprex.data$lat[1:10] <- lat1
reprex.data$lat[11:20] <- lat2
reprex.data$lat[21:30] <- lat3
reprex.data$elev[1:10] <- elev1
reprex.data$elev[11:20] <- elev2
reprex.data$elev[21:30] <- elev3
reprex.data$date[1:10] <- date1
reprex.data$date[11:20] <- date2
reprex.data$date[21:30] <- date3

#viola
head(reprex.data)

https://www.pfeg.noaa.gov/outgoing/rmendels/KFAS/KFAS.nb.html here an example of the basic use of the pkg..Still, I don't know how you would consider the spatial relationship. — RLave, Sep 28 '18 at 07:13
I found also this pkg https://cran.r-project.org/web/packages/SpatioTemporal/vignettes/ST_intro.pdf, maybe this can get you somewhere. — RLave, Sep 28 '18 at 07:19
Unfortunately I also have no experience with KFAS. You are right with imputeTS it is specialized to univariate time series. But this is not true for Amelia - it is made for multivariate imputation. It has the polytime and the lags and leads option to account somehow for temporal correlations ( see page 21 in the manual https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf ). In my opinion not the perfect way to account for temporal correlation but it definitely uses both correlations (spatial, temporal) as you want to. — Steffen Moritz, Sep 30 '18 at 15:57
Amelia, per page 21 ameila.pdf, will allow multiple imputation but not kalman smoothing, if I understand correctly. — Clayton Glasser, Oct 17 '18 at 02:12

Using Kalman smoothing in R's KFAS package to impute missing data

0 Answers0