5

Suppose I have a model like so:

x1 <- rnorm(100)
x2 <- rnorm(100)
y <- x1 + 5 * x2 + rnorm(100)
fit <- lm(y ~ x1 + x2)

How can I output summary(fit) but in order of the magnitude of the estimated coefficients?

badmax
  • 645
  • 2
  • 5
  • 12
  • You can use `coef(fit)` Try `fit$coefficients <- coef(fit)[order(coef(fit))]` – akrun May 18 '19 at 21:58
  • https://stackoverflow.com/questions/6577058/extract-regression-coefficient-values – bbiasi May 18 '19 at 22:04
  • You can do this to get what you want: `coef(summary(fit))[order(coef(summary(fit))[,1], decreasing = T),]`. It keeps the errors, etc. ordered as they should be and shows the actual p-values instead of (what appear as) truncated values. However, inputting them back into the `summary.lm` object looks trickier. According to `?summary.lm`, it appears that the standard error, t-values and p-values are recalculated every time you call `summary`, which seems to be why all those values were changing. To do what you're asking seems like a much more complicated task than it at first appears. – hmhensen May 18 '19 at 23:33

1 Answers1

3

If you don't mind the overhead of loading an external package, broom makes this trivial:

x1 <- rnorm(100)
x2 <- rnorm(100)
y <- x1 + 5 * x2 + rnorm(100)
fit <- lm(y ~ x1 + x2)

library(broom)
coefs <- tidy(fit)
coefs[order(coefs$estimate, decreasing = TRUE),]
#> # A tibble: 3 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 x2            4.95      0.0883    56.1   1.04e-75
#> 2 x1            1.17      0.109     10.7   3.27e-18
#> 3 (Intercept)   0.0131    0.103      0.128 8.99e- 1

Created on 2019-05-18 by the reprex package (v0.2.1)

Edit - adding stat significance annotations

You could add this after the fact:

x1 <- rnorm(100)
x2 <- rnorm(100)
y <- x1 + 5 * x2 + rnorm(100)
fit <- lm(y ~ x1 + x2)

library(broom)
coefs <- tidy(fit)
coefs$p.value <- with(coefs, 
                      ifelse(abs(p.value) > .1, paste0(formatC(p.value, format = "e", digits = 2),""),
                             ifelse(abs(p.value) > .05, paste0(formatC(p.value, format = "e", digits = 2),"."),
                                    ifelse(abs(p.value) > .01, paste0(formatC(p.value, format = "e", digits = 2),"*"),
                                           ifelse(abs(p.value) > .001, paste0(formatC(p.value, format = "e", digits = 2),"**"),
                                           paste0(formatC(p.value, format = "e", digits = 2),"***"))))))
coefs[order(coefs$estimate, decreasing = TRUE),]
#> # A tibble: 3 x 5
#>   term        estimate std.error statistic p.value    
#>   <chr>          <dbl>     <dbl>     <dbl> <chr>      
#> 1 x2            4.91      0.0923    53.2   1.51e-73***
#> 2 x1            0.768     0.0890     8.64  1.17e-13***
#> 3 (Intercept)  -0.0327    0.0990    -0.330 7.42e-01

Created on 2019-05-18 by the reprex package (v0.2.1)

Community
  • 1
  • 1
Chase
  • 67,710
  • 18
  • 144
  • 161
  • That's pretty close, but I was hoping to keep the stars and such from the summary to quickly see what's significant. – badmax May 18 '19 at 22:12
  • @badmax look at the p value column. For example, x1 and x2 coefficients are very much statistically significant. When in doubt, use the following calculator to convert the e notation into more digestible format https://www.free-online-calculator-use.com/scientific-notation-converter.html – PsychometStats May 18 '19 at 23:44
  • 1
    @badmax - see edit. Downside to this approach is you convert the p.value column to a character, so can no longer manipulate this column as if it were a number. – Chase May 19 '19 at 00:03
  • I ended up adding a new column with the annotations. – badmax May 19 '19 at 01:05