Questions tagged [imputets]

An R package to provide functions for time series missing value replacement (imputation).

imputeTS is an for time series replacement ().

It offers several different imputation algorithm implementations. Beyond the imputation algorithms the package also provides plotting and printing functions of time series missing data statistics.

The package is designed to work almost all numeric time-series inputs:

Imputation Methods

Here is a short overview of available imputation algorithms to choose from:

  • na.interpolation (Missing Value Imputation by Interpolation)
  • na.kalman (Missing Value Imputation by Kalman Smoothing)
  • na.locf (Missing Value Imputation by Last Observation Carried Forward)
  • na.ma (Missing Value Imputation by Weighted Moving Average)
  • na.mean (Missing Value Imputation by Mean Value
  • na.random (Missing Value Imputation by Random Sample)
  • na.remove (Remove Missing Values)
  • na.replace (Replace Missing Values by a Defined Value
  • na.seadec (Seasonally Decomposed Missing Value Imputation)
  • na.seasplit (Seasonally Splitted Missing Value Imputation)

    This is a rather broad overview. The functions itself mostly offer more than just one algorithm. For example na.interpolation can be set to linear, stine or spline interpolation.

Installation

The imputeTS package can be found on CRAN. For installation execute in R:

install.packages("imputeTS")

If you want to install the latest version from GitHub (can be unstable) run:

library(devtools) install_github("SteffenMoritz/imputeTS")

Usage

  • Imputation

    To impute (fill all missing values) in a time series x, run the following command: na.interpolation(x) Output is the time series x with all NA's replaced by reasonable values.

    This is just one example for an imputation algorithm. In this case interpolation was the algorithm of choice for calculating the NA replacements. There are several other algorithms (see also under caption "Imputation Algorithms"). All imputation functions are named alike starting with na. followed by a algorithm label e.g. na.mean, na.kalman, ...

  • Plotting

    To plot missing data statistics for a time series x, run the following command: plotNA.distribution(x)

    This is also just one example for a plot. Overall there are four different types of missing data plots. (see also under caption "Missing Data Plots").

  • Printing

    To print statistics about the missing data in a time series x, run the following command: statsNA(x)

Repositories

Vignettes

Other resources

Related tags

56 questions
6
votes
5 answers

Impute missing values with ROLLING mean in R

I am new to R and struggling with a problem. I need a function to impute the missing values in a vector according to the mean value of the elements within a window of a given size. However, this window will move because, say my NA is in position 30,…
s1368647
  • 61
  • 1
  • 3
5
votes
2 answers

Testing for missing values in R

I have a time series data set which has some missing values in it. I wish to impute the missing values but I am unsure as to which method is most appropriate e.g linear, spline or stine from the imputeTS package. For the sake of completeness I wish…
TheGoat
  • 2,587
  • 3
  • 25
  • 58
3
votes
3 answers

how to fill missing values in a vector with the mean of value before and after the missing one

Currently I am trying to impute values in a vector in R. The conditions of the imputation are. Find all NA values Then check if they have an existing value before and after them Also check if the value which follows the NA is larger than the…
3
votes
1 answer

Time series Imputation based on ID

I am working on a time series data. The dataset is: datALL <- read.table(header=TRUE, text=" ID Year Align A01 2017 329 A01 2016 NA A01 2015 NA …
S Das
  • 3,291
  • 6
  • 26
  • 41
2
votes
2 answers

Using function na_ma in a numeric dataframe in R

I am trying to use the function na_ma from library(imputeTS); because I am dealing with missing values in a dataframe by replacing them with the average of the surrounding values. Data…
RMN
  • 59
  • 8
2
votes
0 answers

Imputing based on percentage of NA values

I want to impute temperature values from 6 different weather stations. The data are measured every 30 minutes. I want to impute the values only when there are more than 20 % NA values in a day and month. So I am grouping the values per date/month,…
Max Wfhde
  • 21
  • 3
2
votes
3 answers

fill in blanks with exponential estimates

I'm trying to fill in NA values with numbers that show exponential growth. Below is a data sample of what I'm trying to do. library(tidyverse) expand.grid(X2009H1N1 = "0-17 years", type = "Cases", month =…
user3357059
  • 1,122
  • 1
  • 15
  • 30
2
votes
2 answers

impute missing with interpolation by groups

I am trying to impute missing value NA with interpolation by multiple groups. I just subset a simple example: Year ST CC ID MP PS 2002 15 3 3 NA 1.5 2003 15 3 3 NA 1.5 2004 15 3 3 193 …
Peter Chen
  • 1,464
  • 3
  • 21
  • 48
2
votes
0 answers

Using Kalman smoothing in R's KFAS package to impute missing data

I have a data frame (reproducible example at the bottom) containing a column of values representing precipitation volume, a column of date-of-measurement values, and a column each for lat, lon, and elevation coordinates. The data covers 10 years of…
Clayton Glasser
  • 153
  • 1
  • 11
2
votes
1 answer

Error in na.interpolation(data[, i], option): Input x is not numeric

I have the following problem. I have a data.frame consisting of country "identifier" (letters+numbers), "year" (numbers), "unique identifier" (identifier+year), statistics on "labour market1" (numbers) and statistics on "labour market2" (numbers),…
Ines22
  • 21
  • 4
2
votes
2 answers

What is a suitable impute function for Non Linear TS data?

I'm trying to fill in missing data in R. It's a simple variable, with a date. I'm using the ImputeTS but when I map the output I can tell the data is out. In Excel, when I use a straight line calculation and it appears to be closer. I want to…
Donal B
  • 21
  • 1
1
vote
1 answer

Does a growth rate variable, calculated with the same interpolated variable, create any problem with panel data in R?

Very thankful in advance. I have a panel data in R of some non consecutive years (1821,1831,1832,1833,1837:1875) and population (pop) just for some of those years. I interpolated those missing values with "na_interpolation" function, such…
1
vote
1 answer

How to simulate random values to impute the missing values based on the distribution of available data in pandas?

I have an Age category column in my pandas dataframe, df. In the Age category column, there are 32% missing values which I need to do some imputation. I'm thinking to use the distribution of the available data, which is 68% to impute the missing…
weizer
  • 1,009
  • 3
  • 16
  • 39
1
vote
1 answer

Problems when attempting to use "na_ma" on a list of data frames?

I am a fairly novice R user, but have been trying to do some simple missing value replacement. (Replacing an NA with the mean of the value before and the value after the NA) I have been using the na_ma() function from the imputeTS library and it is…
1
vote
5 answers

Replace values in one vector with values from other vector(s)

I have a dataframe something this: id <- c(1, 2, 3, 4, 5, 6, 7) var1 <- c(1, NA, 2, NA, 1, 1, 2) var2 <- c(1, 1, 2, 2, NA, 2, 2) However, how do I manage to create a new vector, which takes the values from var2, and replace it with NAs in var1 and…
R9413
  • 53
  • 4
1
2 3 4