map/loop different regression models for multiple dependent and independent variables

Question

I know there is a lot on this topic, but I still could not figure it out myself. I have a dataset with various dependent (here: flux) and independent/ explanatory variables (expl). My goal is to iterate through all dependent with each explanatory variable using different models. In the end, I thought to pipe it through purr::glance(), so I could filter for significant p-values and compare the BICs of models.

Here comes a part of my dataset:

library(tidyverse)
library(broom)

dat <- structure(list(management = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "c"
  ), class = "factor"), flux1 = c(61.51, 73.3, 66.74, 117.75, 74.69, 
 74.12, 67.28, 78.67, 63.11, 82.74, 101.46, 80.48, 81.47, 56.76, 
 74.34), flux2 = c(0.72, 8.93, -0.2, 3.9, 0.05, 1.39, -0.69, 2.19, 
 3.31, 1.43, 8.57, 2.42, 2.56, 1.91, 10.41), flux3 = c(-52.97, 
 -59, -52.75, -44.95, -52.96, -62.68, -73.13, -120.33, -82.07, 
 -53.11, -36.01, -75.93, -84.01, -80.23, -127.61), expl1 = c(32.98, 
 76.13, 70.19, 106.29, 118.66, 102.61, 70.77, 45.93, 74.63, 90.48, 
 43.34, 98.7, 92.5, 73.68, 83.62), expl2 = c(8.9, 9, 9.03, 10.03, 
9.75, 9.85, 9.15, 9.28, 8.97, 9.25, 9.72, 9.32, 9.65, 8.82, 9.1
 ), expl3 = c(10.12, 11.62, 11.18, 12.9, 14.05, 13.93, 10.8, 10.85, 
  10.62, 11.97, 11.03, 7.38, 11.55, 12.1, 7.9)), row.names = c(NA, 
-15L), class = "data.frame")

So here is what I could get to work so far, inspired by this post:

a <- dat %>% 
  gather(key = "expl", value = "expl_value", c(expl1:expl3)) %>%
  group_split(expl) %>% 
  map_df(~lm(log(flux1) ~ expl_value, data = .x) %>% 
           glance() %>% 
           mutate(expl_group = unique(.x$expl)))

I could get it to iterate through the explanatory variables but not the dependent variables. And, I could only make it work for one model. I further made a list of models, like they did here and tried to map through that list but could not get it to work:

models <- function(.x) {
  linear <- lm(flux1 ~ expl_value, data = .x)
  exponential <- lm(log(flux1 ) ~ expl_value, data = .x)
  logarithmic <- lm(flux1 ~ log(expl_value), data = .x)
  power <- lm(log(flux1) ~ log(expl_value), data = .x)
  lst(linear, exponential, logarithmic, power)
}

Actually, I also want to expand that list of models by adding + management as a factor, but I can figure that out later. I would appreciate your help and am also open to other suggestions on how to try out various different models and compare their BICs.

Edit: As Stefano pointed out my fluxes can include also negative values (not only flux3). So I would need something like if flux <= 0, then add min(flux) before log-transformation.

Cheers, Anike

Hi! your example specifically won't work because `flux3` contain only negative numbers and the logarithm will produce NaNs — Stefano Barbi, Aug 23 '22 at 11:37
additionally the `models` function should be `function(.x)` in order to be consistent with the body — Stefano Barbi, Aug 23 '22 at 11:44
Hi, oh sorry yes I forgot about the negative data for flux3. Usually, I add the minimum value to the dataset to "make it positive". So is there a smart way of including: if flux contains negative values, add minimum value before log transformation? — anike, Aug 23 '22 at 12:13

Stefano Barbi · Accepted Answer · 2022-08-23T12:28:28.173

Here is a possible solution

library(dplyr)
library(tidyr)
library(purrr)

models <- function(.x) {
  linear <- lm(flux_value ~ expl_value, data = .x)
  exponential <- lm(log(flux_value ) ~ expl_value, data = .x)
  logarithmic <- lm(flux_value ~ log(expl_value), data = .x)
  power <- lm(log(flux_value) ~ log(expl_value), data = .x)
  list(linear = linear ,
       exponential = exponential,
       logarithmic = logarithmic,
       power = power) |>
      map_dfr(broom::glance, .id = "model")
}

dat |>
    pivot_longer(values_to = "flux_value",
                 names_to = "flux",
                 c(flux1:flux3)) |>
    group_by(flux) |>
    mutate(flux_value = if(any(flux_value < 0)) 
                        {
                            flux_value - min(flux_value) + 1e-6
                        }
                        else
                            flux_value) |>
    ungroup() |>
    pivot_longer(values_to = "expl_value",
                 names_to = "expl",
                 c(expl1:expl3)) |>
    group_by(expl, flux) |>
    group_modify(~models(.x))


## A tibble: 36 × 15
## Groups:   expl, flux [9]
#   expl  flux  model      r.squared adj.r.squared  sigma statistic p.value    df
#   <chr> <chr> <chr>          <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>
# 1 expl1 flux1 linear      0.0572        -0.0153  15.7     0.789     0.391     1
# 2 expl1 flux1 exponenti…  0.0670        -0.00478  0.188   0.933     0.352     1
# 3 expl1 flux1 logarithm…  0.0365        -0.0376  15.9     0.492     0.495     1
# 4 expl1 flux1 power       0.0448        -0.0287   0.190   0.609     0.449     1
# 5 expl1 flux2 linear      0.0167        -0.0589   3.55    0.221     0.646     1
# 6 expl1 flux2 exponenti…  0.00333       -0.0733   4.08    0.0435    0.838     1
# 7 expl1 flux2 logarithm…  0.00764       -0.0687   3.57    0.100     0.757     1
# 8 expl1 flux2 power       0.000465      -0.0764   4.09    0.00605   0.939     1
# 9 expl1 flux3 linear      0.00756       -0.0688  26.9     0.0991    0.758     1
#10 expl1 flux3 exponenti…  0.000325      -0.0766   4.81    0.00422   0.949     1
## … with 26 more rows, and 6 more variables: logLik <dbl>, AIC <dbl>,
##   BIC <dbl>, deviance <dbl>, df.residual <int>, nobs <int>

Nice, thank you! I added ` filter(!is.na(flux_value)) %>%` to kick out NA's for my original dataset and now it works. Can you help me with one more line of code ... ? I filtered this dataset for p.value < 0.05 and would like to automatically pick the model with the lowest BIC value — anike, Aug 23 '22 at 13:21

map/loop different regression models for multiple dependent and independent variables

1 Answers1