How to use purrr to iterate over every combo of covariates and outcomes in lm reg

Question

A common scenario for me is that I need to run basically the same regression model but over a series of different outcomes, and for sensitivity analyses I simultaneously need to iterate over different sets of covariates.

I'm still new to R, but using the below with purrr I'm able to iterate over outcomes and covariates, but it marches through pairs of lists in parallel of course, when I need it to march through every combination from each list.

What are some options for how iterative over all combinations of outcomes and covariates?

Also, does anyone know why the below code won't work with "map2?" I get the error message that "as_mapper(.f, ...) : argument ".f" is missing, with no default"

library(dplyr)
library(purrr)

dataset <- tibble(
    y1=rnorm(n=100),
    y2=rnorm(n=100),
    x1=rnorm(n=100),
    x2=rnorm(n=100))


outcomes <- dataset %>%
    select(y1,y2)

covars <- dataset %>%
    select(x1,x2)

paramlist <- list(covarL,outcomeL)

paramlist %>%
    pmap(~lm(.y ~ .x,data=dataset))

1

`dplyr::expand_grid` ? – Ben Bolker Sep 06 '21 at 16:25

TimTeaFan · Accepted Answer · 2021-09-06T16:44:30.597

There are many ways to do this in the larger tidyverse. I am a fan of dplyr::rowwise for this kind of calculations. We can use the colnames instead of the actual data and then create a matrix like tibble with tidyr::expand_grid which contains all combinations of outcomes and covars. Then we can use dplyr::rowwise and use lm inside list() together with reformulate which takes strings as inputs. To get the result we can use broom::tidy.

library(dplyr)
library(purrr)
library(tidyr)

dataset <- tibble(
  y1=rnorm(n=100),
  y2=rnorm(n=100),
  x1=rnorm(n=100),
  x2=rnorm(n=100))

outcomes <- dataset %>%
  select(y1,y2) %>% colnames

covars <- dataset %>%
  select(x1,x2) %>% colnames

paramlist <- expand_grid(outcomes, covars)

paramlist %>%
  rowwise %>% 
  mutate(mod = list(lm(reformulate(outcomes, covars), data = dataset)),
         res = list(broom::tidy(mod)))

#> # A tibble: 4 x 4
#> # Rowwise: 
#>   outcomes covars mod    res             
#>   <chr>    <chr>  <list> <list>          
#> 1 y1       x1     <lm>   <tibble [2 x 5]>
#> 2 y1       x2     <lm>   <tibble [2 x 5]>
#> 3 y2       x1     <lm>   <tibble [2 x 5]>
#> 4 y2       x2     <lm>   <tibble [2 x 5]>

^{Created on 2021-09-06 by the reprex package (v2.0.1)}

We can do the same thing with {purrr} instead of dplyr::rowwise:

paramlist %>% 
  mutate(mod = map2(outcomes, covars, ~ lm(reformulate(.y, .x), data = dataset)),
         res = map(mod, broom::tidy)) 

#> # A tibble: 4 x 4
#>   outcomes covars mod    res             
#>   <chr>    <chr>  <list> <list>          
#> 1 y1       x1     <lm>   <tibble [2 x 5]>
#> 2 y1       x2     <lm>   <tibble [2 x 5]>
#> 3 y2       x1     <lm>   <tibble [2 x 5]>
#> 4 y2       x2     <lm>   <tibble [2 x 5]>

^{Created on 2021-09-06 by the reprex package (v2.0.1)}

Another pure {purrr} solution is to use a nested map call. Since it is nested we need to flatten the results before we can use map(summary) on them.

# outcomes and covars are the same strings as above

outcomes %>% 
  map(~ map(covars, function(.y) lm(reformulate(.y, .x), data = dataset))) %>% 
  flatten %>% 
  map(summary)

#> [[1]]
#> 
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.20892 -0.56744 -0.08498  0.55445  2.10146 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.0009328  0.0923062  -0.010    0.992
#> x1          -0.0809739  0.0932059  -0.869    0.387
#> 
#> Residual standard error: 0.9173 on 98 degrees of freedom
#> Multiple R-squared:  0.007643,   Adjusted R-squared:  -0.002483 
#> F-statistic: 0.7548 on 1 and 98 DF,  p-value: 0.3871
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.11442 -0.59186 -0.08153  0.61642  2.10575 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.02048    0.09461  -0.216    0.829
#> x2          -0.05159    0.10805  -0.477    0.634
#> 
#> Residual standard error: 0.9197 on 98 degrees of freedom
#> Multiple R-squared:  0.002321,   Adjusted R-squared:  -0.007859 
#> F-statistic: 0.228 on 1 and 98 DF,  p-value: 0.6341
#> 
#> 
#> [[3]]
#> 
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.3535 -0.7389 -0.2023  0.6236  3.8627 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.08178    0.10659  -0.767    0.445
#> x1          -0.08476    0.10763  -0.788    0.433
#> 
#> Residual standard error: 1.059 on 98 degrees of freedom
#> Multiple R-squared:  0.006289,   Adjusted R-squared:  -0.003851 
#> F-statistic: 0.6202 on 1 and 98 DF,  p-value: 0.4329
#> 
#> 
#> [[4]]
#> 
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.4867 -0.7020 -0.1935  0.5869  3.7574 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.06575    0.10875  -0.605    0.547
#> x2           0.12388    0.12420   0.997    0.321
#> 
#> Residual standard error: 1.057 on 98 degrees of freedom
#> Multiple R-squared:  0.01005,    Adjusted R-squared:  -5.162e-05 
#> F-statistic: 0.9949 on 1 and 98 DF,  p-value: 0.321

^{Created on 2021-09-06 by the reprex package (v2.0.1)}

How to use purrr to iterate over every combo of covariates and outcomes in lm reg

1 Answers1

Linked