2

I'm trying to apply a function summarizing the linear relationships between an exposure variable (exp) and several outcome variables (out1, out2, etc) within groups. Consider the following toy data, along with a helper function to fit a model between two variables and return the desired output:

library(dplyr)    

df <- tibble(group = sample(c("a", "b"), size = 100, replace = T),
             exp = rnorm(100),
             out1 = rnorm(100, 4, 1),
             out2 = rnorm(100, 3, 1))

linear_beta <- function(y, x) {
  tidy(lm(y ~ x)) %>%
    filter(term == "x") %>%
    mutate(return = paste0("Beta = ", round(estimate, 2))) %>%
    pull(return)
}

If I use the helper function to summarize the relationship between the exposure and a single outcome for both groups, that works

df %>%
  group_by(group) %>%
  summarize(out1 = linear_beta(out1, exp))
# # A tibble: 2 x 2
# group out1       
# <chr> <chr>      
#  a     Beta = 0.01
#  b     Beta = 0.11

However, when I try to use summarize_at and find the relationships for out1 and out2, I get an error

df %>%
  group_by(group) %>%
  summarize_at(c("out1", "out2"), linear_beta, .$exp)

Error in summarise_impl(.data, dots) : Evaluation error: variable lengths differ (found for 'x').

As best I can tell, the lengths for the outcome and .$exp should be identical, though clearly I'm missing something. Any help would be appreciated!

Update:

Seems as though the second argument .$exp is not having the grouping applied to it - as evidenced by the fact that this works.

df %>%
  # group_by(group) %>%
  summarize_at(c("out1", "out2"), linear_beta, .$exp)
# # A tibble: 1 x 2
# out1        out2       
# <chr>       <chr>      
# Beta = 0.08 Beta = 0.06

It's not clear to me how to get groupings applied to .$exp, or if that's even possible....

MeetMrMet
  • 1,349
  • 8
  • 14
  • In the `lm` call you may need a `paste` to take care of multiple arguments i.e. `lm(paste0("y~ ", paste(x, collapse="+")))` – akrun Mar 08 '18 at 13:26
  • Maybe it is a matter of na's: https://stackoverflow.com/questions/19771284/error-in-model-frame-default-variable-lengths-differ – ecp Mar 08 '18 at 13:27
  • @akrun perhaps I'm not fully understanding how `summarize_at` works, but I don't think both `out` variables should be passed to the `lm` call together - I though each variable should take the place of the first argument in my function and then I specified the second argument manually – MeetMrMet Mar 08 '18 at 13:33
  • Is it to be passed out as separate arguments – akrun Mar 08 '18 at 13:33
  • @ecp I thought that might be the case in my real data, but the data set here has no missing data and still gets the same error... – MeetMrMet Mar 08 '18 at 13:33
  • @akrun, sorry not totally clear what you're asking. I'm trying to change the dependent variable in the `lm` call with the same independent variable each time. So first regression is `lm(out1 ~ exp)` and second is `lm(out2 ~ exp)` – MeetMrMet Mar 08 '18 at 13:37

1 Answers1

1

We can try

df %>% 
  nest(-group) %>%
  mutate(Col = map(data, ~ .x %>% 
                           summarise_at(c('out1', 'out2'), linear_beta, .$exp))) %>%
  select(group, Col) %>%
  unnest

# A tibble: 2 x 3
#    group out1         out2       
#    <chr> <chr>        <chr>      
#1 a     Beta = -0.22 Beta = 0.27
#2 b     Beta = 0.1   Beta = 0.06
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Excellent, thank you! Didn't know about `nest()`. I added `... %>% select(group, Col) %>% unnest()` to get back to the form I needed, for anyone reading later – MeetMrMet Mar 08 '18 at 14:04