1

I am hoping to run some relatively simple code in R to help determine which independent variables would be meaningful based on their p-value in a logistic regression. I know that the SignifReg function exists to help determine meaningful variables for lm objects, but is there a similar function/package that exists for logits? Thanks!

The SignifReg function is part of the SignifReg package, and more info can be found here: https://www.rdocumentation.org/packages/SignifReg/versions/3.0/topics/SignifReg

juliah0494
  • 175
  • 11
  • 1
    can you tell us approximately what the `SignifReg` function does (and what package it's in) so we don't have to go find it? (Ots – Ben Bolker Sep 28 '20 at 23:32
  • Hi Ben, sorry about that. Just edited the post! – juliah0494 Sep 28 '20 at 23:34
  • 1
    Have a look at [this post](https://stackoverflow.com/questions/3701170/stepwise-regression-using-p-values-to-drop-variables-with-nonsignificant-p-value) it seems to be driving at something similar. – DaveArmstrong Sep 28 '20 at 23:36
  • 1
    "stepwise" and "GLM OR logistic" are probably the keywords will find this. FWIW `stepAIC()` in the built-in `MASS` package will do some of the basics. You should be aware that a lot of statisticians (including me) feel that stepwise regression is a Really Bad Idea: see https://en.wikipedia.org/wiki/Stepwise_regression#Criticism or google "stepwise regression critique" ... – Ben Bolker Sep 28 '20 at 23:47
  • Noted. Thank you @BenBolker and DaveArmstrong – juliah0494 Sep 28 '20 at 23:49

2 Answers2

0

If you just want the p-values, here is how you can get them:

# building some data
df <- data.frame(response = rbinom(100, 1, 0.75),
       var1 = runif(100),
       var2 = rnorm(100),
       var3 = 1:100,
       var4 = rexp(100))
# making a model
mod <- glm(response ~ var1 + var2 + var3 + var4, data = df, family = "binomial")

You can extra the model coefficient estimates, standard errors, and p-values using coef(). You can specifically ask for the fourth column to get the p-values.

coef(summary(mod))[,4]
(Intercept)        var1        var2        var3        var4 
 0.05886951  0.21382708  0.41254249  0.16239709  0.80457330 
Ben Norris
  • 5,639
  • 2
  • 6
  • 15
  • Hi Ben, thanks for the information! I should've been more clear in my question. I understand how to view p-values, but want to run something such that R creates a model by selecting which independent variables have the higher p-values, and leave out the least significant ones – juliah0494 Sep 28 '20 at 23:47
0

The package rms has a function fastbw which takes models as logistic, coxph. It is based on Lawless and Singhal (1978), 'a first-order approximation that has greater numerical efficiency than full backward selection'. eg:

Glmfullfit = Glm (Hypertension ~ ., family = binomial, data = mydata) #Glm from rms. 
fastbw(Glmfullfit, rule= 'p', type='individual', sls=.1) #retention pvalue = 0.1.         

In SAS, the equivalent code is selection method = backward(fast) [I did not try this code due to SAS cloud). Don't use the coefficients from the fastbw results due to the methodology used in fastbw. Take the final survived variables and fit with glm for coefficients and other parameters.

MLavoie
  • 9,671
  • 41
  • 36
  • 56
yie_stack
  • 16
  • 2