but why don't they give the same fitted values and adjusted R2?
Fitted values won't necessarily be the same, if you don't use all columns from poly
.
set.seed(0)
y <- runif(100)
x <- runif(100)
X <- poly(x, 3)
all.equal(lm(y ~ X)$fitted, lm(y ~ x + I(x ^ 2) + I(x ^ 3))$fitted)
#[1] TRUE
all.equal(lm(y ~ X[, 1:2])$fitted, lm(y ~ x + I(x ^ 2))$fitted)
#[1] TRUE
all.equal(lm(y ~ X - 1)$fitted, lm(y ~ x + I(x ^ 2) + I(x ^ 3) - 1)$fitted) ## no intercept
#[1] "Mean relative difference: 33.023"
all.equal(lm(y ~ X[, c(1, 3)])$fitted, lm(y ~ x + I(x ^ 3))$fitted)
#[1] "Mean relative difference: 0.03008166"
all.equal(lm(y ~ X[, c(2, 3)])$fitted, lm(y ~ I(x ^ 2) + I(x ^ 3))$fitted)
#[1] "Mean relative difference: 0.03297488"
We only have ~ 1 + poly(x, degree)[, 1:k]
equivalent to ~ 1 + x + I(x ^ 2) + ... + I(x ^ k)
, for any k <= degree
. (I explicitly write out the intercept, to emphasize that we have to start from polynomial of degree 0.)
(The reason is related to how an orthogonal polynomial is generated. See How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned? for great great details. Note that when doing a QR factorization X = QR
, as R
is an upper triangular matrix (not a diagonal matrix), Q[, ind]
will not have the same column space with X[, ind]
for an arbitrary subset ind
, unless ind = 1:k
.)
So, I(x ^ 2)
is not equivalent to ploy(x, 2)[, 2]
, and you will get different fitted values hence (adjusted) R2.
is it valid to use only second term orthogonal polynomial and specify the model as follows?
It is really a bad idea for leaps
(or generally any modeler) to drop columns from an orthogonal polynomial. An orthogonal polynomial is a factor-alike term, whose significance is determined by F-statistics (i.e., treating all columns as a whole), rather than t-statistics for individual columns.
In fact, even for raw polynomials, it is not a good idea to omit any low order term. For example, y ~ 1 + I(x ^ 2)
omitting linear term is not a good idea. A basic problem here is that it is not invariant to linear shift. For example, if we shift x
for x1
:
shift <- runif(1) ## an arbitrary value; can be `mean(x)`
x1 <- x - shift
then y ~ 1 + I(x ^ 2)
is not equivalent to y ~ 1 + I(x1 ^ 2)
, but still, y ~ 1 + x + I(x ^ 2)
is equivalent to y ~ 1 + x1 + I(x1 ^ 2)
.
all.equal(lm(y ~ 1 + I(x ^ 2))$fitted, lm(y ~ 1 + I(x1 ^ 2))$fitted)
#[1] "Mean relative difference: 0.02020984"
all.equal(lm(y ~ 1 + x + I(x ^ 2))$fitted, lm(y ~ 1 + x1 + I(x1 ^ 2))$fitted)
#[1] TRUE
I briefly mentioned the issue of dropping columns at R: How to or should I drop an insignificant orthogonal polynomial basis in a linear model?, but my examples here give you more insight.
Is there more direct way to retrieve single model from regsubsets
output than specifying the model by hand?
I don't know; at least I did not figure it out almost 2 years ago when answering this thread: Get all models from leaps regsubsets.
One remaining question though. Assuming that leaps
returns poly(X, 2)1
I should definitely retain poly(X, 2)1
in my model. But what if only poly(X, 2)1
is returned by leaps
? Can higher order term can be dropped then?
There is no problem dropping higher order terms (in this case where you originally fitted a quadratic polynomial). As I said, we have equivalence for ind = 1:j
, where j <= degree
. But make sure you understand this. Take the following two examples.
- If
leaps
drops poly(x, 5)3
and poly(x, 5)5
. you can safely remove poly(x, 5)5
, but are still advised to retain poly(x, 5)3
. This is, instead of fitting an 5-th order polynomial, you fit a 4-th order one.
- If
leaps
drops poly(x, 6)3
and poly(x, 6)5
. Since poly(x, 6)6
is not dropped, you are advised to drop no terms at all.