Suppose I fit a polynomial logistic regression with all covariates available to me. Then, I decide I want to make the model simpler by removing covariates that either harm or do little to improve prediction as assessed by an out-of-sample set of data (using, say, cross-validation). I would like to use a genetic algorithm or backwards selection with AIC. However, I have not found an implementation that respects the hierarchical structure of a polynomial. For example, the feature selection procedure might keep an x^2
term, but drop the main effect x
. I do not want this.
So, how do I keep lower-order covariates during feature selection for polynomial logistic regression in R if it selects only higher ones for that same feature?