1

I am new to modeling in R, so I'm stumbling a bit...

I have a model in Eviews, which I have to translate to R and make further upgrades. The model is multiple OLS with AR(1) of residuals. I implemented it like this

model1 <- lm(y ~ x1 + x2 + x3, data)
data$e <- dplyr:: lag(residuals(model1), 1)

model2 <- lm(y ~ x1 + x2 + x3 + e, data)

My issue is the same as it is in this thread and I expected it: while parameter estimations are similar, they are different enought that I cannot use it.

I am planing of using ARIMA from stats package, but the problem is implementation. How to make AR(1) on residuals, and make other variables as they are?

Oliver
  • 8,169
  • 3
  • 15
  • 37
Vesnič
  • 365
  • 5
  • 17
  • This seems more like a statistics question than a programming question. Although if you're looking for an arima solution my bet is `arima(y, xreg = cbind(x1, x2, x3), order = c(1, 0, 0))` would do the job. However, if your problem is a multiple OLS, then using OLS seems completely sensible. You should likely check your residuals for [autocorrelation](https://www.rdocumentation.org/packages/tseries/versions/0.1-2/topics/acf) and [partial acf](https://www.rdocumentation.org/packages/tseries/versions/0.1-2/topics/pacf) and if that's not a problem, then you're likely alright using OLS. – Oliver Jul 17 '20 at 06:54
  • Yeah, autocorrelation is the problem, that's why I am using AR(1) on residuals. – Vesnič Jul 23 '20 at 12:30

1 Answers1

2

Provided I understood you correctly, you can supply external regressors to your arima model through the xreg argument.

You don't provide sample data so I don't have anything to play with, but your model should translate to something like

model <- arima(data$y, xreg = as.matrix(data[, c("x1", "x2", "x3")]), order = c(1, 0, 0))

Explanation: The first argument data$y contains your time series data. xreg contains your external regressors as a matrix, with every column containing as many observations for that regressor as you have time points. order = c(1, 0, 0) defines an AR(1) model.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • That did help a bit, but there is still some difference in parameters (on 3rd or 4th digit). The bigger problem is that some regressors estimated with ```arima``` became less significant or even insignificant. Probably the reason is that ```arima``` uses MLE method and not OLS, which was used in Eviews. – Vesnič Jul 23 '20 at 12:27
  • 1
    @Vesnič I'd say using MLE is the more robust way (amongst other things, OLS can handle an AR process but not an MA process, see e.g. [here](https://stats.stackexchange.com/questions/321442/why-is-maximum-likelihood-used-for-arima-instead-of-least-squares) for details). And yes, MLE and OLS would give you slightly different estimates (assuming normally distributed residuals). There's `ar.ols` that allows you to fit an AR model to data using OLS, but the method does not allow for external regressors. So the bottom line is that you'd have to accept the small differences in estimates. – Maurits Evers Jul 24 '20 at 02:47
  • I realized I missed an important component that was used in Eviews. The "HAC (Newly-West)" covariance method was used. If I do not use it in Eviews, probabilities get approximately the same as in R. – Vesnič Jul 24 '20 at 09:56
  • @Vesnič The Newey-West covariance method seems wants to account for heteroskedasticity and auto-correlation in the residuals when using OLS for parameter estimation. I've not really come across this method anywhere else other than in econometrics-related areas. TBH, I'm very skeptical of these kind of post-hoc optimisations. Better to understand and account for heteroskedasticity and auto-correlation in the residuals in the model directly. This is might something that would be interesting to raise on [Cross Validated](https://stats.stackexchange.com/) where you have the real stats experts. – Maurits Evers Jul 24 '20 at 12:54