1

I have been trying to fit a sequential polynomial regression model in R and have encountered the following issue: while poly(x) offers a quick way, this function does not respect the hierarchical principle, which roughly states that before moving to higher orders, all terms of lower order should have been included in the model.

One solution here, might be to handpick the order of entrance into the model yourself as I have done with a toy dataset below

pred<-matrix(c(rnorm(30),rnorm(30)),ncol=2)
y<-rnorm(30)

polys<-poly(pred,degree=4,raw=T)
z<-matrix(c(
#order 2
polys[,2],polys[,6],polys[,9],
#order 3
polys[,3],polys[,7],polys[,10],polys[,12],
#order 4
polys[,4],polys[,8],polys[,11],polys[,13],polys[,14]),
ncol=12)

polyreg3<-function(x){
BICm<-rep(0,dim(x)[2])
for(i in 1:dim(x)[2]){
model<-lm(y~pred[,1]+pred[,2]+x[,1:i]) #include one additional term each time
BICm[i]<-BIC(model)
}
list(BICm=BICm)
}

polyreg3(z)
which.min(polyreg3(z)$BICm)

but this is largely impractical for larger degrees of polynomials. I was wondering then, is there a way to deal with this issue, preferably by adapting my code?

josliber
  • 43,891
  • 12
  • 98
  • 133
JohnK
  • 1,019
  • 3
  • 14
  • 30
  • `for` loop are are best avoided in `R`. Removing your loop would be one thing to experiment with. There are plenty of examples of how to do that on SO (for example [here's a more generic example](http://stackoverflow.com/questions/4894506/avoid-two-for-loops-in-r) or [one where somebody is applying a lm to a data.frame](http://stackoverflow.com/questions/27539033/r-apply-lm-on-each-data-frame-row). Also, you may wish to profile your code to find your bottle neck with the [profr package](http://cran.r-project.org/web/packages/profr/index.html). – Richard Erickson Apr 23 '15 at 16:14
  • @RichardErickson Thanks for the suggestions, although they are not my most pressing concerns at the moment. – JohnK Apr 23 '15 at 16:19

1 Answers1

1

If I understand correctly, you'd need not only original independent variables but also all combinations of variables that can be created given a degree.

The data is divided by three - dependent variable, original independent variables and extra variables that are created by model.frame(), given a degree (here it is 2 for simplicity).

Then all combinations of extra variables are obtained by combn() and Map() as ways of selecting columns are variable (1 to # of columns).

Data sets to be fit are created by cbind() and they are independent variable (ind) and original independent variables (original) and extra combinations of variables (extra).

Finally lm() is fit and BIC() values are obtained.

Multiple trials are necessary if a higher order of degree is aimed. For example, if degree is 3, both 2nd and 3rd degrees should be applied.

set.seed(1237)
# independent variable
des <- data.frame(y = rnorm(30))
# dependent variables
pred<-matrix(c(rnorm(30), rnorm(30)), ncol=2)
# model frame given degree, 4095 combinations when degree = 4, set degree = 2 for simplicity
polys <- as.data.frame(poly(pred, degree = 2, raw = T))
# original independent variables
original <- polys[,c(names(polys)[names(polys) == "1.0" | names(polys) == "0.1"])]
# extra variables made by model.frame()
extra <- polys[,c(names(polys)[names(polys) != "1.0" & names(polys) != "0.1"])]
# all combinations of extra variables
# Map() for variable q in nCq, do.call() to make list neat
com <- do.call(c, Map(combn, ncol(extra), 1:ncol(extra), simplify = FALSE))
com
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 1 2

[[5]]
[1] 1 3

[[6]]
[1] 2 3

[[7]]
[1] 1 2 3

# data combined, followed by fitting lm()
bic <- lapply(com, function(x) {
  data <- cbind(des, original, extra[, x, drop = FALSE])
  BIC(lm(y ~ ., data))
})

do.call(c, bic)
[1] 100.3057 104.6485 104.8768 103.6572 103.4162 108.0270 106.7262
Jaehyeon Kim
  • 1,328
  • 11
  • 16