3

I'm in the process of creating a forecast based on the hts package but before getting this far I need to clean the data for outliers and missing values.

For this I thought of using the tsclean function in the forecast package. I got my data stored in data frame with multiple columns (time series) that I wish to get cleaned. I can get the function to work when only having one time serie, but since I do have quite a lot i'm looking for a smart way to do this.

When running the code:

SFA5 <- ts(SFA4, frequency=12, start=c(2012,1), end=c(2017,10)) 
ggt <- tsclean(SFA5[1:70, 1:94], replace.missing = TRUE)

I get this error message:

Error in na.interp(x, lambda = lambda) : The time series is not univariate.

The data is here:

https://www.dropbox.com/s/dow2jpuv5unmtgd/Data1850.xlsx?dl=0

My question is: what am i doing wrong or is the only solution to do a loop sequence

Jan Boldt
  • 135
  • 10

1 Answers1

4

The error message suggests that the function takes univariate time series as its first argument only. So you need to apply tsclean to each column, as you might have guessed.

library(forecast)
ggt <- sapply(X = SFA5[1:70, 1:94], FUN = tsclean)
markus
  • 25,843
  • 5
  • 39
  • 58
  • thank you for your assistance. I'm pretty new to R, so that really did help me from doing a lot of "looping" :-) – Jan Boldt Nov 27 '17 at 08:16
  • You're welcome. For this kind of tasks you might also want to read this great post about the `apply` family [here](https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family/7141669). And also check out the `map` functions from the `purrr` package. – markus Nov 27 '17 at 08:46
  • perfect. Thank you for the references. – Jan Boldt Nov 27 '17 at 17:50