auto.arima not reading csv file dates correctly (Full Code + Dataset)

Question

Here is a full code + dataset

r code https://drive.google.com/file/d/16UOuY9HaeGFmNn7POl0Z8vjIxP70xUSv/view?usp=sharing and
NASDAQ 100 daily prices dataset https://drive.google.com/file/d/1XXNOdWFwY_RqZ1vXmyO5Wj-o4Nu6ci1l/view?usp=sharing

"auto.arima" function simply refuses to read dates correctly.

Can you please take a look. So far, nothing I found on StackOverflow / Google / YouTube worked. Perhaps I am doing something wrong?

I just want to get this working to play around with forecasts (which in most cases aren't accurate at all, but I am doing it for fun).

Thank you! :)

What's your problem exactly? Is there any error in that code? The `auto.arima` function doesn't actually "read dates". I wonder if you're reading in the data file correctly. You've said it is a csv file but the code has `read_excel`. — Edward, Jul 05 '20 at 21:10
Hi, welcome to Stack Overflow. Ideally your code would be in the post. Please take a look at making a reproducible example https://stackoverflow.com/help/minimal-reproducible-example and further tips [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Mark Neal, Jul 17 '20 at 04:19

Rob Hyndman · Answer 1 · 2020-07-06T06:08:30.873

This has got nothing to do with auto.arima(). You are using ts objects which do not store dates explicitly; instead they store the starting time, ending time and the frequency (the number of observations per time period). The forecast package is designed to handle ts objects.

Since you are using tidyverse already, you will probably find it easier to use the newer tsibble class of objects which stores dates explicitly. Here is the same analysis you posted but done using tsibble objects.

library(tidyverse)
library(tsibble)
library(feasts)
#> Loading required package: fabletools
library(fable)

NDX_prices <- read_csv("~/Downloads/NDX.csv") %>%
  mutate(trading_day = row_number()) %>%
  as_tsibble(index=trading_day) 
#> Parsed with column specification:
#> cols(
#>   Date = col_date(format = ""),
#>   Close = col_double()
#> )

NDX_prices %>%
  autoplot(Close) + 
  ggtitle("NASDAQ 100 ARIMA") + ylab("Closing Prices")


fit_ARIMA <- NDX_prices %>%
  model(arima = ARIMA(Close))

fit_ARIMA %>% report()
#> Series: Close 
#> Model: ARIMA(1,1,0) 
#> 
#> Coefficients:
#>           ar1
#>       -0.3767
#> s.e.   0.0583
#> 
#> sigma^2 estimated as 25768:  log likelihood=-1630.42
#> AIC=3264.83   AICc=3264.88   BIC=3271.88

fit_ARIMA %>% gg_tsresiduals()


fcast <- fit_ARIMA %>% forecast(h = 1)

fcast
#> # A fable: 1 x 4 [1]
#> # Key:     .model [1]
#>   .model trading_day           Close  .mean
#>   <chr>        <dbl>          <dist>  <dbl>
#> 1 arima          253 N(10318, 25768) 10318.

fcast %>% autoplot(NDX_prices)

^{Created on 2020-07-06 by the reprex package (v0.3.0)}

Notes:

Your data is a csv file, not an excel file, so you can't use read_excel. Instead use read_csv.
Because trading does not happen every day, you need to index the series by trading_day (number of trading days since the start of the series) rather than Date. Otherwise the series will contain a lot of missing values.
Similarly, the forecasts are indexed by trading day, not date. But these can be translated back to dates if you know what days will be traded in the future.

Thank you Rob! Now I get it! Thanks a lot man, I am a rookie, and learning every day. You explained everything clearly, and now I get it. You expanded my brain! — Daniel, Jul 06 '20 at 04:23

auto.arima not reading csv file dates correctly (Full Code + Dataset)

1 Answers1