How to get a collection of p-values for linear regression?

Question

I have a data of 131 columns. The first column is my Y. I have 130 Xs. I want to have 130 linear regressions which are lm(y ~ x1), lm(y ~ x2), lm(y ~ x3 ) ....lm(y ~x130). Then get the p-value of every of these fit. How can I make it faster? for loop or apply?

score 3 · Accepted Answer · answered Mar 19 '18 at 18:36

Using base R only this can be done with a series of *apply instructions.

First, I will make up some data since you have posted none.

set.seed(7637)    # Make the results reproducible

n <- 100
dat <- as.data.frame(replicate(11, rnorm(n)))
names(dat) <- c("Y", paste0("X", 1:10))

Now, for the regressions.

lm_list <- lapply(dat[-1], function(x) lm(Y ~ x, dat))
lm_smry <- lapply(lm_list, summary)
lm_pval <- sapply(lm_smry, function(x) x$coefficients[, "Pr(>|t|)"])

score 1 · Answer 2 · answered Mar 19 '18 at 18:32

If your data looks something like this (only larger)

> library(dplyr)
> tbl <- data.frame(
+     A = rnorm(10),
+     B = rnorm(10),
+     C = rnorm(10)
+ ) %>% mutate(
+     y = 2 * A + rnorm(10, .1)
+ )
> tbl
            A           B           C           y
1  -1.3430281  0.06457155 -0.31477796 -3.54276780
2  -0.8045598  0.55160502 -0.04486946 -0.17595827
3   0.6432380 -0.38036302  0.30313165  2.71317260
4   0.9282322  0.92453929  1.52828109  1.41677569
5  -0.2104841 -0.31510189 -1.32938820 -0.02714028
6  -1.8264372  0.92910256  0.16072524 -5.09970701
7   0.9568248  0.42829255 -0.28423084  1.58072449
8  -1.2061661 -1.10672961  0.69626390 -3.19605711
9   0.6173230  2.74964116  0.67350556  1.78849532
10 -1.1575590 -0.01747244 -0.10611764 -3.09733526

you can use tidyr to make it into a form that is easier to work with

> tidy_tbl <- tbl %>% tidyr::gather(var, x, -y)
> head(tidy_tbl)
            y var          x
1 -3.54276780   A -1.3430281
2 -0.17595827   A -0.8045598
3  2.71317260   A  0.6432380
4  1.41677569   A  0.9282322
5 -0.02714028   A -0.2104841
6 -5.09970701   A -1.8264372

Then, you can use broom to fit a model per var group

> library(broom)
> fitted <- tidy_tbl %>% 
+     group_by(var) %>% 
+     do(model = lm(y ~ x, data = .))
> fitted
Source: local data frame [3 x 2]
Groups: <by row>

# A tibble: 3 x 2
  var   model   
* <chr> <list>  
1 A     <S3: lm>
2 B     <S3: lm>
3 C     <S3: lm>

You can use tidy to move the fitted models from nested lists in the data frame to summaries of them:

> fitted %>% tidy(model)
# A tibble: 6 x 6
# Groups:   var [3]
  var   term        estimate std.error statistic   p.value
  <chr> <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 A     (Intercept)   0.0744     0.305     0.244 0.814    
2 A     x             2.46       0.288     8.54  0.0000271
3 B     (Intercept)  -1.05       0.945    -1.11  0.298    
4 B     x             0.750      0.891     0.842 0.424    
5 C     (Intercept)  -0.842      0.920    -0.915 0.387    
6 C     x             0.610      1.26      0.485 0.641

How to get a collection of p-values for linear regression?

2 Answers2