3

I'm able to do forecasts with an ARIMA model, but when I try to do a forecast for a linear model, I do not get any actual forecasts - it stops at the end of the data set (which isn't useful for forecasting since I already know what's in the data set). I've found countless examples online where using this same code works just fine, but I haven't found anyone else having this same error.

library("stats")

library("forecast")

y <- data$Mfg.Shipments.Total..USA.

model_a1 <- auto.arima(y)

forecast_a1 <- forecast.Arima(model_a1, h = 12)

The above code works perfectly. However, when I try to do a linear model....

model1 <- lm(y ~ Mfg.NO.Total..USA. + Mfg.Inv.Total..USA., data = data )
f1 <- forecast.lm(model1, h = 12)

I get an error message saying that I MUST provide a new data set (which seems odd to me, since the documentation for the forecast package says that it is an optional argument).

f1 <- forecast.lm(model1, newdata = x, h = 12)

If I do this, I am able to get the function to work, but the forecast only predicts values for the existing data - it doesn't predict the next 12 periods. I have also tried using the append function to add additional rows to see if that would fix the issue, but when trying to forecast a linear model, it immediately stops at the most recent point in the time series.

Here's the data that I'm using:

+------------+---------------------------+--------------------+---------------------+
|            | Mfg.Shipments.Total..USA. | Mfg.NO.Total..USA. | Mfg.Inv.Total..USA. |
+------------+---------------------------+--------------------+---------------------+
| 2110-01-01 | 3.59746e+11               | 3.58464e+11        | 5.01361e+11         |
| 2110-01-01 | 3.59746e+11               | 3.58464e+11        | 5.01361e+11         |
| 2110-02-01 | 3.62268e+11               | 3.63441e+11        | 5.10439e+11         |
| 2110-03-01 | 4.23748e+11               | 4.24527e+11        | 5.10792e+11         |
| 2110-04-01 | 4.08755e+11               | 4.02769e+11        | 5.16853e+11         |
| 2110-05-01 | 4.08187e+11               | 4.02869e+11        | 5.18180e+11         |
| 2110-06-01 | 4.27567e+11               | 4.21713e+11        | 5.15675e+11         |
| 2110-07-01 | 3.97590e+11               | 3.89916e+11        | 5.24785e+11         |
| 2110-08-01 | 4.24732e+11               | 4.16304e+11        | 5.27734e+11         |
| 2110-09-01 | 4.30974e+11               | 4.35043e+11        | 5.28797e+11         |
| 2110-10-01 | 4.24008e+11               | 4.17076e+11        | 5.38917e+11         |
| 2110-11-01 | 4.11930e+11               | 4.09440e+11        | 5.42618e+11         |
| 2110-12-01 | 4.25940e+11               | 4.34201e+11        | 5.35384e+11         |
| 2111-01-01 | 4.01629e+11               | 4.07748e+11        | 5.55057e+11         |
| 2111-02-01 | 4.06385e+11               | 4.06151e+11        | 5.66058e+11         |
| 2111-03-01 | 4.83827e+11               | 4.89904e+11        | 5.70990e+11         |
| 2111-04-01 | 4.54640e+11               | 4.46702e+11        | 5.84808e+11         |
| 2111-05-01 | 4.65124e+11               | 4.63155e+11        | 5.92456e+11         |
| 2111-06-01 | 4.83809e+11               | 4.75150e+11        | 5.86645e+11         |
| 2111-07-01 | 4.44437e+11               | 4.40452e+11        | 5.97201e+11         |
| 2111-08-01 | 4.83537e+11               | 4.79958e+11        | 5.99461e+11         |
| 2111-09-01 | 4.77130e+11               | 4.75580e+11        | 5.93065e+11         |
| 2111-10-01 | 4.69276e+11               | 4.59579e+11        | 6.03481e+11         |
| 2111-11-01 | 4.53706e+11               | 4.55029e+11        | 6.02577e+11         |
| 2111-12-01 | 4.57872e+11               | 4.81454e+11        | 5.86886e+11         |
| 2112-01-01 | 4.35834e+11               | 4.45037e+11        | 6.04042e+11         |
| 2112-02-01 | 4.55996e+11               | 4.70820e+11        | 6.12071e+11         |
| 2112-03-01 | 5.04869e+11               | 5.08818e+11        | 6.11717e+11         |
| 2112-04-01 | 4.76213e+11               | 4.70666e+11        | 6.16375e+11         |
| 2112-05-01 | 4.95789e+11               | 4.87730e+11        | 6.17639e+11         |
| 2112-06-01 | 4.91218e+11               | 4.87857e+11        | 6.09361e+11         |
| 2112-07-01 | 4.58087e+11               | 4.61037e+11        | 6.19166e+11         |
| 2112-08-01 | 4.97438e+11               | 4.74539e+11        | 6.22773e+11         |
| 2112-09-01 | 4.86994e+11               | 4.85560e+11        | 6.23067e+11         |
| 2112-10-01 | 4.96744e+11               | 4.92562e+11        | 6.26796e+11         |
| 2112-11-01 | 4.70810e+11               | 4.64944e+11        | 6.23999e+11         |
| 2112-12-01 | 4.66721e+11               | 4.88615e+11        | 6.08900e+11         |
| 2113-01-01 | 4.51585e+11               | 4.50763e+11        | 6.25881e+11         |
| 2113-02-01 | 4.56329e+11               | 4.69574e+11        | 6.33157e+11         |
| 2113-03-01 | 5.04023e+11               | 4.92978e+11        | 6.31055e+11         |
| 2113-04-01 | 4.84798e+11               | 4.76750e+11        | 6.35643e+11         |
| 2113-05-01 | 5.04478e+11               | 5.04488e+11        | 6.34376e+11         |
| 2113-06-01 | 4.99043e+11               | 5.13760e+11        | 6.25715e+11         |
| 2113-07-01 | 4.75700e+11               | 4.69012e+11        | 6.34892e+11         |
| 2113-08-01 | 5.05244e+11               | 4.90404e+11        | 6.37735e+11         |
| 2113-09-01 | 5.00087e+11               | 5.04849e+11        | 6.34665e+11         |
| 2113-10-01 | 5.05965e+11               | 4.99682e+11        | 6.38945e+11         |
| 2113-11-01 | 4.78876e+11               | 4.80784e+11        | 6.34442e+11         |
| 2113-12-01 | 4.80640e+11               | 4.98807e+11        | 6.19458e+11         |
| 2114-01-01 | 4.56779e+11               | 4.57684e+11        | 6.36568e+11         |
| 2114-02-01 | 4.62195e+11               | 4.70312e+11        | 6.48982e+11         |
| 2114-03-01 | 5.19472e+11               | 5.25900e+11        | 6.47038e+11         |
| 2114-04-01 | 5.04217e+11               | 5.06090e+11        | 6.52612e+11         |
| 2114-05-01 | 5.14186e+11               | 5.11149e+11        | 6.58990e+11         |
| 2114-06-01 | 5.25249e+11               | 5.33247e+11        | 6.49512e+11         |
| 2114-07-01 | 4.99198e+11               | 5.52506e+11        | 6.57645e+11         |
| 2114-08-01 | 5.17184e+11               | 5.07622e+11        | 6.59281e+11         |
| 2114-09-01 | 5.23682e+11               | 5.24051e+11        | 6.55582e+11         |
| 2114-10-01 | 5.17305e+11               | 5.09549e+11        | 6.59237e+11         |
| 2114-11-01 | 4.71921e+11               | 4.70093e+11        | 6.57044e+11         |
| 2114-12-01 | 4.84948e+11               | 4.86804e+11        | 6.34120e+11         |
+------------+---------------------------+--------------------+---------------------+

Edit - Here's the code I used for adding new datapoints for forecasting.

library(xts)
library(mondate)

d <- as.mondate("2115-01-01")
d11 <- d + 11
seq(d, d11)
newdates <- seq(d, d11)
new_xts <- xts(order.by = as.Date(newdates))

new_xts$Mfg.Shipments.Total..USA. <- NA
new_xts$Mfg.NO.Total..USA. <- NA
new_xts$Mfg.Inv.Total..USA. <- NA
x <- append(data, new_xts)
  • For the `lm`, did you try `?predict.lm` – C8H10N4O2 Jun 29 '15 at 15:40
  • Yes, predict.lm does the same thing - it only provided predicted values for points where I have the exact values. – user5037511 Jun 29 '15 at 16:06
  • Did you specify a `newdata` argument that had dates beyond the data you fit the model? Take a look at `?predict.lm`. – Gregor Thomas Jun 29 '15 at 16:46
  • Yes, I tried adding dates beyond what were in the model as well and I had the same issue. It would predict / forecast (same result on both functions) NA for any date after the data used to fit the model. – user5037511 Jun 29 '15 at 16:48
  • Are your explanatory variables (right hand side of the formula) present in the new data for the dates you want to predict? – C8H10N4O2 Jun 29 '15 at 17:06
  • Yes, I tried doing it just for the dependent variable and I received an error message. – user5037511 Jun 29 '15 at 17:14
  • Could you show the code where you create a new data set with additional dates and pass that to `predict()` with the linear model? That seems to be where your problem is but you haven't shown any of that code. – Gregor Thomas Jun 30 '15 at 00:23
  • @Gregor - I added an edit to the original post with the code for adding additional dates. – user5037511 Jul 01 '15 at 19:00
  • In your updated code, you define an object `x` that looks like your new data. In your forecast code, you specify `newdata = data`. You should be using `x`. I'm not sure if that's a typo in the transcription to SO or not... it would also be super helpful if you'd share your data using `dput` so that it's easily importable. [See here for reproducibility suggestions](http://stackoverflow.com/q/5963269/903061). – Gregor Thomas Jul 01 '15 at 21:09
  • It's a typo due to me still trying to figure this out on my own and making changes. This is what I have in my current version for forecasting: f1 <- forecast.lm(model1, newdata = x, h = 12). This results in forecasts for everything BUT the last 12 months, which are all NA. – user5037511 Jul 02 '15 at 12:35

2 Answers2

1

Not sure if you ever figured this out, but just in case I thought I'd point out what's going wrong.

The documentation for forecast.lm says:

An optional data frame in which to look for variables with which to predict. If omitted, it is assumed that the only variables are trend and season, and h forecasts are produced.

so it's optional if trend and season are your only predictors.

The ARIMA model works because it's using lagged values of the time series in the forecast. For the linear model, it uses the given predictors (Mfg.NO.Total..USA. and Mfg.Inv.Total..USA. in your case) and thus needs their corresponding future values; without these, there are no independent variables to predict from.

In the edit, you added those variables to your future dataset, but they still have values of NA for all future points, thus the forecasts are also NA.

Gabe
  • 649
  • 5
  • 6
0

Gabe is correct. You need future values of your causals.

You should consider the Transfer Function modeling process instead of regression(ie developed for use with cross-sectional data). By using prewhitening your X variables (ie build a model for each one), you can calculate the Cross correlation function to see any lead or lag relationship.

Normalized Plot

It is very apparent that Inv.Total is a lead variable(b**-1) from the standardized graph of Y and the two x's. When Invto moves down so does shipments. In addition, there is also AR seasonal component beyond the causals that is driving the data. There are a few outliers as well so this is a robust solution. I am developer of this software used here, but this can be run in any tool.

Model

Actual and Outlier Adjusted

Tom Reilly
  • 350
  • 2
  • 8