1

I've built a non-linear time series regression model in R that I would like to write down as an equation, so that I can back-test against my data in an Excel spreadsheet. I've created a .ts object and created a model using the tslm function, as shown below:

model16 <- tslm(production ~ date + I(date^2) + I(date^3) +
                   I(temp_neg_32^3) +
                   I(humidity_avg^3) +
                   I(dew_avg^3) +
                  below_freezing_min, 
                data = production_temp_no_outlier.ts)

I find the coefficients for each variable in the model by using the following code:

summary(model16)

The output is below:

enter image description here

So, my understanding is that the equation of my model should be:

y = -7924000000 + 1268000*date -67.62*(date^2) + 0.001202*(date^3) +
0.04395*(temp_neg_32^3) + 0.008658*(humidity_avg^3) -0.03762*(dew_avg^3) + -11930*below_freezing_min

However, whenever I plug the data into this equation, the output is just completely off - it has nothing in common with the fitted curve visualization that I build in R based on this model. So I am clearly doing something wrong. I will be very grateful if someone could help point out my errors!

cubarto
  • 43
  • 4
  • It would help if you can [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some or all of the data `production_temp_no_outlier.ts` in plain text format. Also include all the relevant code: _e.g._ the library (forecast?) which contains the `tslm` function. – neilfws Feb 07 '22 at 22:45

1 Answers1

1

This use of regression doesn't give you an exact fit, it gives you the line of best fit. What is the coefficient of determination? (AKA explained variance or R^2)

Take a look at this set of data (somewhat modeled after your example).

library(forecast)
library(tidyverse)

data("us_change", package = "fpp3")

fit <-  tslm(Production~Savings + I(Savings^2) + I(Savings^3) + I(Income^3) + Unemployment,
             data = ts(us_change))
summary(fit)

Here I extracted the coefficients, so I can show you a bit more of what I mean. Then I created a function that calculates the outcome of the regression equation.

cFit <- coefficients(fit)
#   (Intercept)       Savings  I(Savings^2)  I(Savings^3)   I(Income^3)  Unemployment 
#  5.221684e-01  6.321979e-03 -2.472784e-04 -6.376422e-06  7.029079e-03 -3.144743e+00  


regFun <- function(cFit, data){
  attach(data)
  f = cFit[[2]] * Savings + cFit[[3]] * Savings^2 + cFit[[4]] * Savings^3 + cFit[[5]] * Income^3 + Unemployment + cFit[[1]]
  detach(data)
  return(f)
}

Here are some examples of the predicted outcome versus the actual outcome.

fitOne <- regFun(cFit, us_change[1,])
# [1] 1.455793 

us_change[1,]$Production
# [1] -2.452486 

fitTwo <- regFun(cFit, us_change[2,])
# [1] 1.066338 

us_change[2,]$Production
# [1] -0.5514595 

fitThree <- regFun(cFit, us_change[3,])
# [1] 1.08083 

us_change[3,]$Production
# [1] -0.3586518 

You can tell from the variance here that the production volume is not explained very well by the inputs I provided.

Now look at what happens when I graph this:

plt <- ggplot(data = us_change %>% 
                mutate(Regression = regFun(cFit, us_change)),
       aes(x = Production)) +
  geom_point(aes(y = Savings, color = "Savings")) +
  geom_point(aes(y = Savings^2, color = "Savings^2")) + 
  geom_point(aes(y = Savings^3, color = "Savings^3")) +
  geom_point(aes(y = Savings^3, color = "Savings^3")) +
  geom_point(aes(y = Unemployment, color = "Unemployment")) +
  geom_line(aes(y = Regression, color = "Regression")) +  # regression line
  scale_color_viridis_d(end = .8) + theme_bw()

plotly::ggplotly(plt)

enter image description here

The regression equation output is the black line. It's the best fit, but there are values that are not represented all that well.

If you look closer, it's not a straight line either.

enter image description here

Dharman
  • 30,962
  • 25
  • 85
  • 135
Kat
  • 15,669
  • 3
  • 18
  • 51