function for removing nonsignificant variables at one step in R

Question

I am trying to automate logistic regression in R. Basically, my source code will generate a new equation everyday as the input data is updated, (Variables, data format etc are same) and print out te significant variables with corresponding coefficients. When I use step function, sometimes the resulting coefficients are not significant. Therefore, I want to update my set of coefficients and get rid of all the ones that are not significant enough. Is there a function or automated way of doing it? If not, the only way I can think of is writing a script on another language that takes the coefficients and corresponding P value and checking significance, and rerunning R accordingly. But even for that, do you know how can I get only P values and coefficients of variables. I can either print whole summary of regression result with "summary" function. I can't reach only P values.

Thank you very much

I'm sure people will give you good technical advice on how to do what you ask, but my advice would be to **seriously reconsider doing this at all**. There are a (very) few good reasons to do stepwise regression, and many many good reasons not to: see e.g. http://www.stata.com/support/faqs/stat/stepwise.html for a start ... — Ben Bolker, Apr 01 '12 at 22:28
and if you insist: http://stackoverflow.com/questions/3701170/stepwise-regression-using-p-value — Ben Bolker, Apr 01 '12 at 22:37
Yes, perhaps instead of automating extraction of P-values consider automating extraction of likelihoods and number of parameters. K. Burnham and D. Anderson have published a bunch of papers and books on model selection and AIC. — Mark Miller, Apr 01 '12 at 22:49
@BenBolker Thanks for the link. In order to automate, stepwise function was the first thing came to my mind. MarkMiller, thanks for the references. I'll read them before my applications — sahara, Apr 01 '12 at 23:12
It would be useful to know *why* you are constructing these logistic regressions -- for prediction (in which case you might want to use multi-model averaging á la Burnham and Anderson, or perhaps better by penalized regression as in the `glmnet` package)? For hypothesis testing? For categorization? — Ben Bolker, Apr 01 '12 at 23:39

Eric Fail · Answer 1 · 2012-04-01T22:31:05.857

1

It's a bit hard for me without sample code and data, but you can subset based on variable values like this,

newdata <- data[ which(data$p.value < 0.5), ]

You can inspect your R object using str, see ?str to figure out how to select whatever you want to use in your subset $p.value or $residuals.

If this doesn't answer your question try submitting some sample code and data.

Best, Eric

edited Apr 01 '12 at 22:31

answered Apr 01 '12 at 22:25

Eric Fail

8,191
8
72
128

Thanks, This might work. I will try it on Monday and let you know. – sahara Apr 01 '12 at 23:02

function for removing nonsignificant variables at one step in R

1 Answers1