1

Suppose I fit a polynomial logistic regression with all covariates available to me. Then, I decide I want to make the model simpler by removing covariates that either harm or do little to improve prediction as assessed by an out-of-sample set of data (using, say, cross-validation). I would like to use a genetic algorithm or backwards selection with AIC. However, I have not found an implementation that respects the hierarchical structure of a polynomial. For example, the feature selection procedure might keep an x^2 term, but drop the main effect x. I do not want this.

So, how do I keep lower-order covariates during feature selection for polynomial logistic regression in R if it selects only higher ones for that same feature?

Aegis
  • 145
  • 10
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 02 '23 at 19:08
  • This is a good question; the principle of *marginality* (as described e.g. by Venables 1998 . http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf ) is well-understood and respected by R machinery like `step()`, `MASS::stepAIC`, and `glmulti` for interactions among variables, but applies equally to polynomial models, and I don't know of easy machinery to do this ... https://www.psychology.mcmaster.ca/bennett/psy710/notes/mw7-marginality-example.html – Ben Bolker Mar 02 '23 at 21:26
  • Good to know, thank you! I'll probably add this as a feature request to the tidymodels folks. Probably pretty low on their priority list. – Aegis Mar 03 '23 at 17:03

0 Answers0