1

I am trying to automate logistic regression in R. Basically, my source code will generate a new equation everyday as the input data is updated, (Variables, data format etc are same) and print out te significant variables with corresponding coefficients. When I use step function, sometimes the resulting coefficients are not significant. Therefore, I want to update my set of coefficients and get rid of all the ones that are not significant enough. Is there a function or automated way of doing it? If not, the only way I can think of is writing a script on another language that takes the coefficients and corresponding P value and checking significance, and rerunning R accordingly. But even for that, do you know how can I get only P values and coefficients of variables. I can either print whole summary of regression result with "summary" function. I can't reach only P values.

Thank you very much

sahara
  • 143
  • 1
  • 8
  • 4
    I'm sure people will give you good technical advice on how to do what you ask, but my advice would be to **seriously reconsider doing this at all**. There are a (very) few good reasons to do stepwise regression, and many many good reasons not to: see e.g. http://www.stata.com/support/faqs/stat/stepwise.html for a start ... – Ben Bolker Apr 01 '12 at 22:28
  • and if you insist: http://stackoverflow.com/questions/3701170/stepwise-regression-using-p-value – Ben Bolker Apr 01 '12 at 22:37
  • Yes, perhaps instead of automating extraction of P-values consider automating extraction of likelihoods and number of parameters. K. Burnham and D. Anderson have published a bunch of papers and books on model selection and AIC. – Mark Miller Apr 01 '12 at 22:49
  • @BenBolker Thanks for the link. In order to automate, stepwise function was the first thing came to my mind. MarkMiller, thanks for the references. I'll read them before my applications – sahara Apr 01 '12 at 23:12
  • It would be useful to know *why* you are constructing these logistic regressions -- for prediction (in which case you might want to use multi-model averaging á la Burnham and Anderson, or perhaps better by penalized regression as in the `glmnet` package)? For hypothesis testing? For categorization? – Ben Bolker Apr 01 '12 at 23:39

1 Answers1

1

It's a bit hard for me without sample code and data, but you can subset based on variable values like this,

newdata <- data[ which(data$p.value < 0.5), ]

You can inspect your R object using str, see ?str to figure out how to select whatever you want to use in your subset $p.value or $residuals.

If this doesn't answer your question try submitting some sample code and data.

Best, Eric

Eric Fail
  • 8,191
  • 8
  • 72
  • 128