0

I have some data with missing values that I know to be positive. I'm trying to interpolate the missing values using na.interp from the forecast package. However, some of the interpolated values turn out to be negative.

I've tried na.approx from the package zoo, but the approximated values do not agree with the seasonal trend of the time series.

I cannot interpolate in the log domain since some of my observations are 0. Interpolating in the square-root domain somehow produces too many outliers. Is there any other way to interpolate time series while preserving positivity? Any references to other R packages would also be appreciated.

curious
  • 125
  • 1
  • 12
  • Try `na.StructTS` from the zoo package. See help page. – G. Grothendieck Jun 10 '17 at 20:31
  • @G.Grothendieck `na.StructTS` is taking too long on my around-8000-values-long time series, and I have nearly 100 such series. Any way to optimize this, maybe? – curious Jun 10 '17 at 22:25
  • @G.Grothendieck also, passing my data as a zoo series to `na.StructTS` gives me this error: `Error in rowSums(tsSmooth(StructTS(y))[, -2]) : 'x' must be an array of at least two dimensions` – curious Jun 10 '17 at 22:33
  • 1
    The error is coming from `rowSums`. Please review [ask] and [mcve]. – G. Grothendieck Jun 10 '17 at 23:18
  • 1
    `approxfun::approx` – M-- Jun 11 '17 at 04:02
  • @Masoud thanks. `approx` worked best for me. The interpolation at some points is not as good as `na.interp`, but at least it preserved positivity :) Since that does answer the question, I'll accept an answer if you post one. – curious Jun 11 '17 at 21:48
  • @curious if you post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), I will post an answer. Without that, it's a little bit hard to make my own dataframe (laziness) ;) – M-- Jun 11 '17 at 22:13

1 Answers1

0

There is the imputeTS package, which specifically focuses on missing values in time series. (take a look at this Paper)

It works like this:

na_kalman(yourTimeSeries)

That's it already.

It offers several time series imputation functions:

  • Imputation by Linear Interpolation
  • Imputation by Spline Interpolation
  • Imputation by Stineman Interpolation
  • Imputation by Structural Model & Kalman Smoothing
  • Imputation by ARIMA State Space Representation & Kalman Sm.
  • Imputation by Last Observation Carried Forward
  • Imputation by Next Observation Carried Backward
  • Missing Value Imputation by Simple Moving Average
  • Imputation by Linear Weighted Moving Average
  • Imputation by Exponential Weighted Moving Average
  • Missing Value Imputation by Mean Value
  • Seasonally Decomposed Missing Value Imputation
  • Seasonally Splitted Missing Value Imputation

Some of these functions are more advanced some are less advanced. I would try the na_kalman() function of the package for this task. Might be that the results of this function already adhere the constraints. Otherwise you need to perform some transformations before performing the imputation (as explained below).

In general if you want your imputation to be constrained to some bounds this transformation approach might also help:

library("imputeTS")

# Bounds
a <- 50
b <- 400

# Transform data
y <- log((myTimeSeries-a)/(b-myTimeSeries))
imputations <- na_kalman(y)

# Back-transform
imputationsBack <- (b-a)*exp(imputations)/(1+exp(imputations)) + a
Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55