38

I have

library(ISLR)
attach(Wage)

# Polynomial Regression and Step Functions

fit=lm(wage~poly(age,4),data=Wage)
coef(summary(fit))

fit2=lm(wage~poly(age,4,raw=T),data=Wage)
coef(summary(fit2))

plot(age, wage)
lines(20:350, predict(fit, newdata = data.frame(age=20:350)), lwd=3, col="darkred")
lines(20:350, predict(fit2, newdata = data.frame(age=20:350)), lwd=3, col="darkred")

The prediction lines seem to be the same, however why are the coefficients so different? How do you intepret them in raw=T and raw=F.

I see that the coefficients produced with poly(...,raw=T) match the ones with ~age+I(age^2)+I(age^3)+I(age^4).

If I want to use the coefficients to get the prediction "manually" (without using the predict() function) is there something I should pay attention to? How should I interpret the coefficients of the orthogonal polynomials in poly().

ECII
  • 10,297
  • 18
  • 80
  • 121
  • 1
    Did you read this:http://www.inside-r.org/r-doc/stats/poly ? raw means whether using orthogonal polynomials. – Haochen Wu May 02 '15 at 08:08
  • Yes I did. What a probably need is a small explanation regarding orthogonal polynomials in model building – ECII May 02 '15 at 08:09
  • I think it's more about how R store the model internally and it makes little difference when come to prediction, only numerical precision is of concern, but I'm not a statistician and such kind of question may better fit mathematics stack exchange. – Haochen Wu May 02 '15 at 08:15
  • well it effects the coefficients dramatically, so I dont think its much internal – ECII May 02 '15 at 08:16
  • 3
    http://en.wikipedia.org/wiki/Orthogonal_polynomials Wikipedia has a page about it. It's kind of like you are changing the axis system and your coordinate change dramatically but the vector will stay the same. (Probably not the best analogy but again I'm not a mathematician.) – Haochen Wu May 02 '15 at 08:18
  • did you find out which orthonogonal polynomials it uses? – Charlie Parker Dec 04 '17 at 17:01
  • What I gather from these answers is: If you want to, say, publish the equation for your polynomial such that others could plug in (raw) values of the predictors and get a prediction that makes sense... you want to use the `raw=T` option. However, the p-values from the `raw=F` output are more informative. Good luck explaining that to non-technical reviewers/readers! – Brian D Mar 01 '18 at 20:46

2 Answers2

85

By default, with raw = FALSE, poly() computes an orthogonal polynomial. It internally sets up the model matrix with the raw coding x, x^2, x^3, ... first and then scales the columns so that each column is orthogonal to the previous ones. This does not change the fitted values but has the advantage that you can see whether a certain order in the polynomial significantly improves the regression over the lower orders.

Consider the simple cars data with response stopping distance and driving speed. Physically, this should have a quadratic relationship but in this (old) dataset the quadratic term is not significant:

m1 <- lm(dist ~ poly(speed, 2), data = cars)
m2 <- lm(dist ~ poly(speed, 2, raw = TRUE), data = cars)

In the orthogonal coding you get the following coefficients in summary(m1):

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136    

This shows that there is a highly significant linear effect while the second order is not significant. The latter p-value (i.e., the one of the highest order in the polynomial) is the same as in the raw coding:

                            Estimate Std. Error t value Pr(>|t|)
(Intercept)                  2.47014   14.81716   0.167    0.868
poly(speed, 2, raw = TRUE)1  0.91329    2.03422   0.449    0.656
poly(speed, 2, raw = TRUE)2  0.09996    0.06597   1.515    0.136

but the lower order p-values change dramatically. The reason is that in model m1 the regressors are orthogonal while they are highly correlated in m2:

cor(model.matrix(m1)[, 2], model.matrix(m1)[, 3])
## [1] 4.686464e-17
cor(model.matrix(m2)[, 2], model.matrix(m2)[, 3])
## [1] 0.9794765

Thus, in the raw coding you can only interpret the p-value of speed if speed^2 remains in the model. And as both regressors are highly correlated one of them can be dropped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • 3
    Fantastic answer. Thank you very much. Just a small question: How should one interpret the coefficients of the orthogonal polynomial? – ECII May 02 '15 at 08:45
  • 1
    I think the exact units are difficult to interpret but this is true in either polynomial, I think. But in the orthogonal case, the quadratic term gives you the deviations from just the linear polynomial; and the cubic term the deviations from just the quadratic polynomial etc. – Achim Zeileis May 02 '15 at 09:04
  • 1
    @ECII @ Achim In an answer to http://stackoverflow.com/questions/31457230/r-translate-a-model-having-orthogonal-polynomials-to-a-function-using-qr-decomp I've given a function to translate the orthogonal polynomial to "conventional" coefficients of powers. Although it seems to work I don't like that function, it really is ugly :-0, I'd appreciate suggestions for alternative approaches. – user20637 Jul 17 '15 at 11:29
  • 1
    @AchimZeileis Since I would have run dist~speed+I(speed^2), which is equivalent to poly(speed,2,raw=T) in your example, I would have concluded that neither speed, nor speed^2 seem statistically significant based on the p-values of .656 or .136. On the other hand, model m1 shows speed is statistically significant, but not speed^2. I'm having trouble understanding this... two contradictory pieces of information. Could you help me grasp the concept better? – Rahul Jan 22 '17 at 18:51
  • 2
    @Rahul That's the whole point of the orthogonalization. In the raw coding you can only interpret the p-value of speed of speed^2 remains in the model. And as both regressors are highly correlated one of them can be drooped. However, in the orthogonal coding speed^2 only captures the quadratic part that has not been captured by the linear term. And then it becomes clear that the linear part is significant while the quadratic part has no additional significance. – Achim Zeileis Jan 22 '17 at 20:44
  • 1
    @AchimZeileis - I think your last comment should maybe be part of your formal answer, such was my leap in personal insight after reading your statement about the interpretation of the p-values being dependent on all the variables present in the raw coding model. This is maybe one aspect of regression easily forgotten in the excitement, by me at least - that everything (e.g. the coeff table) is conditioned on everything else (i.e. the other variables). As you say, in an orthogonal model the 'droppable' variables are more evident, and it is this 'so what' that helped my understanding immensely, – Big Old Dave May 16 '17 at 13:37
  • 1
    @Big Old Dave OK, good suggestion. It surely cannot "hurt". So I've added it at the end of the answer. Thanks! – Achim Zeileis May 16 '17 at 20:17
  • do you know which orthogonal polynomials it uses though? – Charlie Parker Dec 04 '17 at 17:00
  • The manual `?poly` says in the "Details" section: The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343-4). – Achim Zeileis Dec 04 '17 at 18:31
  • Supposing I have a significant `poly(speed, 2)2` expression and insignificant `poly(speed, 2)1`. Is it valid to include only `poly(speed, 2)[,2]` in my full model? Does first direction `poly(speed, 2)[,1]` is equivalent to `speed`? – jakes Jul 07 '18 at 16:09
  • What if I want to fit this polynomial and use it to calculate the value of X that gives me the maximum Y. Would I need to use raw=T or raw=F to compute the polynomial? – skan Jul 05 '22 at 17:44
  • This is easier in the `raw = TRUE` format where you just need to know the coefficients: x = -b1/(2 * b2) where b1 and b2 are the coefficients of the raw linear and quadratic term respectively. For original polynomials you need to take into account the observed regressor vector as well. – Achim Zeileis Jul 05 '22 at 18:49
2

I believe the way the polynomial regression would be run based on raw=T, is that one would look at the highest power term and assess its significance based on the pvalue for that coefficient.

If found not significant (large pvalue) then the regression would be re-run without that particular non-significant power (ie. the next lower degree) and this would be carried out one step at a time reducing if not significant.

If at any time the higher degree is significant then the process would stop and assert that, that degree is the appropriate one.

Mooncrater
  • 4,146
  • 4
  • 33
  • 62
Joe F
  • 31
  • 3