An R package to provide functions for time series missing value replacement (imputation).
imputeTS
is an r-packagefor time series missing-data replacement (imputation).
It offers several different imputation algorithm implementations. Beyond the imputation algorithms the package also provides plotting and printing functions of time series missing data statistics.
The package is designed to work almost all numeric time-series inputs:
- Base-r data types like vector, data.frame and matrix
ts
objects from base-r- Advanced time series objects like zoo and xts
Imputation Methods
Here is a short overview of available imputation algorithms to choose from:
- na.interpolation (Missing Value Imputation by Interpolation)
- na.kalman (Missing Value Imputation by Kalman Smoothing)
- na.locf (Missing Value Imputation by Last Observation Carried Forward)
- na.ma (Missing Value Imputation by Weighted Moving Average)
- na.mean (Missing Value Imputation by Mean Value
- na.random (Missing Value Imputation by Random Sample)
- na.remove (Remove Missing Values)
- na.replace (Replace Missing Values by a Defined Value
- na.seadec (Seasonally Decomposed Missing Value Imputation)
na.seasplit (Seasonally Splitted Missing Value Imputation)
This is a rather broad overview. The functions itself mostly offer more than just one algorithm. For example na.interpolation can be set to linear, stine or spline interpolation.
Installation
The imputeTS package can be found on CRAN
. For installation execute in R:
install.packages("imputeTS")
If you want to install the latest version from GitHub
(can be unstable) run:
library(devtools)
install_github("SteffenMoritz/imputeTS")
Usage
- Imputation
To impute (fill all missing values) in a time series x, run the following command:
na.interpolation(x)
Output is the time series x with all NA's replaced by reasonable values.This is just one example for an imputation algorithm. In this case interpolation was the algorithm of choice for calculating the NA replacements. There are several other algorithms (see also under caption "Imputation Algorithms"). All imputation functions are named alike starting with na. followed by a algorithm label e.g. na.mean, na.kalman, ...
- Plotting
To plot missing data statistics for a time series x, run the following command:
plotNA.distribution(x)
This is also just one example for a plot. Overall there are four different types of missing data plots. (see also under caption "Missing Data Plots").
- Printing
To print statistics about the missing data in a time series x, run the following command:
statsNA(x)
Repositories
Vignettes
Other resources
- imputeTS: Time Series Missing Value Imputation in R scientific article in the R Journal
- CRAN Task View on Time Series Analysis
- How to cite imputeTS in articles