2

I'm trying to figure out how can I set up purrr to run several multiple regressions like the image below. As you will notice, this dataset describes an intervention program and we are analyzing this data using ANCOVA procedures (TIME 2 ~ TIME 1 + CONDITION).

om4g**TIME2**01 ~ om4g**TIME1**01 + CONDITION

example:

om4g201 ~ om4g01 + CONDITION

Regression models

Just in case someone want a reproducible code:

dataset <- data.frame(rest201=c(10,20,30,40),
                      rest101=c(5,10,20,24),
                      omgt201=c(40,10,20,10),
                      omgt101=c(10,20,10,05),
                      CONDITION=c(0,1))
lm(rest201~rest101+CONDITION, data=dataset)
lm(omgt201~omgt101+CONDITION, data=dataset)

I found just one similar question than mine here (Making linear models in a for loop using R programming) but the answer was not working.

Thanks!

acylam
  • 18,231
  • 5
  • 36
  • 45
Luis
  • 1,388
  • 10
  • 30

2 Answers2

4

Similar to @Roman's answer, here is how to do it using map2 from purrr:

library(purrr)

y_var = c("rest201", "omgt201")
x_var = list(c("rest101", "CONDITION"), c("omgt101", "CONDITION"))

map2(x_var, y_var, ~ lm(as.formula(paste(.y, "~", paste(.x, collapse = " + "))), data = dataset))

To get the summary table for each model, you can wrap each lm with summary and extract the coefficients table:

map2(x_var, y_var, ~ {
  lm(as.formula(paste(.y, "~", paste(.x, collapse = " + "))), data = dataset) %>%
    summary() %>%
    `$`("coefficients")
})

Result:

[[1]]
            Estimate Std. Error   t value   Pr(>|t|)
(Intercept) 2.779097 0.76821670  3.617596 0.17169133
rest101     1.377672 0.04750594 29.000000 0.02194371
CONDITION   3.800475 0.72163694  5.266464 0.11945968

[[2]]
                 Estimate Std. Error       t value  Pr(>|t|)
(Intercept)  3.000000e+01  16.666667  1.800000e+00 0.3228289
omgt101     -2.445145e-16   1.333333 -1.833859e-16 1.0000000
CONDITION   -2.000000e+01  14.529663 -1.376494e+00 0.3999753
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Wow @useR, Amazing! Thanks much! That is what I was looking for! Could you please tell me how to report the significance of each predictor ? – Luis Nov 27 '17 at 20:49
  • 1
    Thanks much, @useR. Where I should click to accept your answer ? – Luis Nov 27 '17 at 21:06
1

You could construct a list of formulas for each model and use that to construct a model.

x <- c(101, 102, 103)
mdls <- sprintf("omg4g%s ~ om4g%s + CONDITION",
        as.character(x + 100), 
        as.character(x)
)

out <- sapply(mdls, FUN = function(x) {
  formula(x, data = latino_dataset)
})

$`omg4g201 ~ om4g101 + CONDITION`
omg4g201 ~ om4g101 + CONDITION
<environment: 0x0000000009aff7b8>

$`omg4g202 ~ om4g102 + CONDITION`
omg4g202 ~ om4g102 + CONDITION
<environment: 0x0000000009afda98>

$`omg4g203 ~ om4g103 + CONDITION`
omg4g203 ~ om4g103 + CONDITION
<environment: 0x00000000099b0828>

e.g.

sapply(out, FUN = lm)
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Thanks much for your reply, @Roman Luštrik. I appreciate it! The code was almost 100%. I changed the last sentence to: sapply(out, FUN = lm, data=latino_dataset) (because R was returning an error). Unfortunately, the output is something like that: sapply(out, FUN = lm, data=latino_dataset) om4g201 ~ om4g101 + CONDITION coefficients Numeric,3 Could you please tell me how to display the summary instead this output. Thanks much again! – Luis Nov 27 '17 at 20:40
  • @Luis try adding `sapply(..., simplify = FALSE)` and you should get a list of the results. – Roman Luštrik Nov 27 '17 at 21:15