52

I'm running many regressions and am only interested in the effect on the coefficient and p-value of one particular variable. So, in my script, I'd like to be able to just extract the p-value from the glm summary (getting the coefficient itself is easy). The only way I know of to view the p-value is using summary(myReg). Is there some other way?

e.g.:

fit <- glm(y ~ x1 + x2, myData)
x1Coeff <- fit$coefficients[2] # only returns coefficient, of course
x1pValue <- ???

I've tried treating fit$coefficients as a matrix, but am still unable to simply extract the p-value.

Is it possible to do this?

Thanks!

ch-pub
  • 1,664
  • 6
  • 29
  • 52

5 Answers5

72

You want

coef(summary(fit))[,4]

which extracts the column vector of p values from the tabular output shown by summary(fit). The p-values aren't actually computed until you run summary() on the model fit.

By the way, use extractor functions rather than delve into objects if you can:

fit$coefficients[2]

should be

coef(fit)[2]

If there aren't extractor functions, str() is your friend. It allows you to look at the structure of any object, which allows you to see what the object contains and how to extract it:

summ <- summary(fit)

> str(summ, max = 1)
List of 17
 $ call          : language glm(formula = counts ~ outcome + treatment, family = poisson())
 $ terms         :Classes 'terms', 'formula' length 3 counts ~ outcome + treatment
  .. ..- attr(*, "variables")= language list(counts, outcome, treatment)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. ..- attr(*, "term.labels")= chr [1:2] "outcome" "treatment"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(counts, outcome, treatment)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "factor" "factor"
  .. .. ..- attr(*, "names")= chr [1:3] "counts" "outcome" "treatment"
 $ family        :List of 12
  ..- attr(*, "class")= chr "family"
 $ deviance      : num 5.13
 $ aic           : num 56.8
 $ contrasts     :List of 2
 $ df.residual   : int 4
 $ null.deviance : num 10.6
 $ df.null       : int 8
 $ iter          : int 4
 $ deviance.resid: Named num [1:9] -0.671 0.963 -0.17 -0.22 -0.956 ...
  ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 $ coefficients  : num [1:5, 1:4] 3.04 -4.54e-01 -2.93e-01 1.34e-15 1.42e-15 ...
  ..- attr(*, "dimnames")=List of 2
 $ aliased       : Named logi [1:5] FALSE FALSE FALSE FALSE FALSE
  ..- attr(*, "names")= chr [1:5] "(Intercept)" "outcome2" "outcome3" "treatment2" ...
 $ dispersion    : num 1
 $ df            : int [1:3] 5 4 5
 $ cov.unscaled  : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 $ cov.scaled    : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 - attr(*, "class")= chr "summary.glm"

Hence we note the coefficients component which we can extract using coef(), but other components don't have extractors, like null.deviance, which you can extract as summ$null.deviance.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 1
    you beat me while I was looking for dupes (there aren't any perfect duplicates but there are a lot of posts about extracting stuff from `[g]lm` fits: e.g. http://stackoverflow.com/questions/12496368/how-to-extract-tabular-summary-data-from-an-lm-command-in-r ) – Ben Bolker May 23 '14 at 22:08
  • 3
    might be worth adding a comment on `str()` when you don't know what accessors are available and have to try to dig stuff out of the object for yourself. – Ben Bolker May 23 '14 at 22:13
  • Thanks @BenBolker. Have added something on `str`. – Gavin Simpson May 23 '14 at 22:31
  • 1
    Actually, I was using str() to try to figure out how to get the data, but I don't see where in the str() I can deduce that I need the coef() function to get what I'm looking for. I'm reading you're update and I still don't see that either... – ch-pub May 23 '14 at 22:40
  • 1
    @Ben I looked a few times for other posts but couldn't find any. – ch-pub May 23 '14 at 22:41
  • 3
    the way to find out about `coef` is to do `methods(class="lm")` or `methods(class="summary.lm")`. I agree that you can't figure out from looking at `str()` that you could use `coef()`. – Ben Bolker May 23 '14 at 23:11
  • Actually, I can't even see in the methods(class="summary.glm") or methods(class="glm") output that I should be able to use coef() on the object. Could you please elaborate? Thanks! – ch-pub May 23 '14 at 23:59
  • I tend to use `names(summ)` or `attributes(summ)` to get at the attributes of an object; although this does not tell me the methods it gives me more compact info than `str`. Sometimes I use the frowned upon `summ$coefficients` if I'm not sure of the extractor function; I have not yet received an excommunication notice from the Church of R. – James King May 23 '14 at 23:59
  • 1
    @Clark Look at `class(fit) and you'll see that a `glm` fit inherits from class `"lm"` so you would need to look for methods for that class. – Gavin Simpson May 24 '14 at 00:18
  • I've been using this: cbind( coef(summary(model))[, 4], exp(coef(model)), exp(confint(model))) , but it takes a long time "Waiting for profiling to be done..." any suggestions as to how to fix this? – Mark Apr 13 '23 at 19:00
  • 1
    @Mark don't use profile likelihood to compute confidence intervals for parameters (which is what the `confint()` method for GLMs is doing. – Gavin Simpson Apr 13 '23 at 22:35
  • thanks! @GavinSimpson, what should I use instead? – Mark Apr 14 '23 at 14:29
  • 1
    @Mark well, there's a reason `confint()` is using profile likelihood - the usual Gaussian approximation can often give poor confidence intervals in GLMs. Unfortunately profiling the likelihood of the model takes time. You *could* form intervals on the link scale as beta_i +/- (2 * std_err_i) and then back transform those intervals to the response scale using the inverse of the link function, but that's using the gaussian approximation which can be problematic in some situations. Bootstrapping could also be used. That won't be quick either. – Gavin Simpson Apr 14 '23 at 14:49
9

Instead of the number you can put directly the name

coef(summary(fit))[,'Pr(>|z|)']

the other ones available from the coefficient summary:

Estimate Std. Error z value Pr(>|z|)

R. Prost
  • 1,958
  • 1
  • 16
  • 21
4

I've used this technique in the past to pull out predictor data from summary or from a fitted model object:

coef(summary(m))[grepl("var_i_want$",row.names(coef(summary(m)))), 4]

which lets me easily edit which variable I want to get data on.

Or as pointed out be @Ben, use match or %in%, somewhat cleaner than grepl:

coef(summary(m))[row.names(coef(summary(m))) %in% "var_i_want" , 4]
James King
  • 6,229
  • 3
  • 25
  • 40
3

The tidy function from the broom package (part of the Tidyverse, available on CRAN) provides a quick and easy way to convert your GLM summaries into a data frame, which might be useful in a number of situations other than the one you described above.

In this case, your desired output could be obtained with the code:

x1pValue <- broom::tidy(fit)$p.value[2]
1

Well, this would be another way, however not the most efficient way to perform it :

a = coeftable(model).cols[4]
pVals = [ a[i].v for i in 1:length(a) ]

This ensures that the extracted values from the glm is not in StatsBase. Therein, you can play around with pVals as per your requirement. Hope it helps, Ebby

Philipp Zettl
  • 177
  • 1
  • 3
  • 17
Ebby
  • 141
  • 1
  • 4