0

I am attempting to impute NA values in a univariate time series using the imputeTS package in R and I have noticed something strange when I try to do the imputation by Kalman smoothing using na_kalman().

My data is daily average temperature data so it is similar to the pseudo data in the code below, which simulates 2 years of numerical data with NA's:

tseries=ts(sample(c(1:10,NA),730,replace = TRUE),start = 1990,frequency = 365)

Now for the strange part: I have noticed that if I try to pass this time series to the na_kalman() function, it seems to always crash my R session.

library(imputeTS)
kal.imp<-na_kalman(tseries) #fails

However, if I use the same data as either a numerical vector or a time series with frequency 1, it seems to work just fine. This seems to suggest that the problem is the frequency of the time series, for some reason.

This also seems to happen if I try to use Kalman smoothing as an option for na_seadec(), regardless of whether find_frequency is TRUE of FALSE:

sd.kal.imp.false<-na_seadec(tseries, algorithm = "kalman", find_frequency = FALSE)#fails

sd.kal.imp.true<-na_seadec(tseries, algorithm = "kalman", find_frequency = TRUE) #also fails

Can anyone help me understand why this is happening?

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
  • Hi, welcome to StackOverflow. Could you provide some data (real or simulated) that reproduces your error? This lets users figure out what is going wrong. See: https://stackoverflow.com/a/5963610/5805670 – slamballais Apr 02 '20 at 17:22
  • Hey @Laterow. Thanks for pointing out that omission. I just went back and added some random data of the same type, after checking to make sure the problem persists. Unfortunately, it does. – eyehearyou Apr 02 '20 at 17:40
  • Can you provide an example without sample()? (since it's random sampling everybody will have a different dataset). I just tested it and so far it hasn't crashed - but it is still running. Which must not necessarily be a problem - unfortunately the na_kalman function is extremely slow for longer time series. For time series up to 150 observations the na_kalman runtime is usually quite acceptable. Maybe just make a dput() of your tseries object and post it here then we will work on the some series. I forgot one thing: welcome to Stack Overflow :) – Steffen Moritz Apr 02 '20 at 20:24

0 Answers0