I have a data frame (reproducible example at the bottom) containing a column of values representing precipitation volume, a column of date-of-measurement values, and a column each for lat, lon, and elevation coordinates. The data covers 10 years of measurement, and 10 different lat/long/elev points (levels which I will call "stations").
The precipitation column is MCAR missing 3.4% of its values. My goal is to impute the missing values, taking into account both the temporal correlation (the NA's position within its station's time series) and the spatial correlation (the NA's geographic relationship to the rest of the points.)
I do not think typical ARIMA based techniques, such as those found in Amelia or ImputeTS will satisfy, because they are limited to univariate data.
I am interested in using the KFAS package because I believe it will allow me to treat these different "stations" as "states" within the "state space", and enable me to use Kalman smoothing to "predict" the missing values based on the correlation of the both spatial and temporal variables.
My trouble is that I am having a VERY hard time getting over KFAS' learning curve and implementing this model. The documentation is sparse and there are next to no tutorials or beginner focused material out there. I'm feeling like I don't even know how to get started.
Can KFAS be used this way? How would you approach this challenge? What would the basic steps look like in KFAS?
Since I barely know how to frame this question, I have made an effort to make good reproducible data. This sample data covers three "stations" over 1 month, which I'm thinking should be sufficient for demonstration. The values are realistic but not accurate.
#defining the precip variable
set.seed(76)
precip <- sample(0:7, 30, replace=TRUE)
#defining the categorical variables
lon1 <- (-123.7500)
lon2 <- (-124.1197)
lon3 <- (-124.0961)
lat1 <- (43.9956)
lat2 <- (44.0069)
lat3 <- (44.0272)
elev1 <- 76.2
elev2 <- 115.8
elev3 <- 3.7
date1 <- seq(as.Date('2011-01-01'), as.Date('2011-01-10'),by=1)
date2 <- seq(as.Date('2011-01-11'), as.Date('2011-01-20'),by=1)
date3 <- seq(as.Date('2011-01-21'), as.Date('2011-01-30'),by=1)
#creating the df
reprex.data <- NULL
reprex.data$precip <- precip
#inserting NA's randomly into the precip vector now to easily avoid doing it to the other variables
reprex.data <- as.data.frame(lapply(reprex.data, function(cc) cc[sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE)]))
#creating the rest of the df
reprex.data$lon[1:10] <- lon1
reprex.data$lon[11:20] <- lon2
reprex.data$lon[21:30] <- lon3
reprex.data$lat[1:10] <- lat1
reprex.data$lat[11:20] <- lat2
reprex.data$lat[21:30] <- lat3
reprex.data$elev[1:10] <- elev1
reprex.data$elev[11:20] <- elev2
reprex.data$elev[21:30] <- elev3
reprex.data$date[1:10] <- date1
reprex.data$date[11:20] <- date2
reprex.data$date[21:30] <- date3
#viola
head(reprex.data)