0

I have code to generate plots of interaction effects such as the following: (modelled in MTcars)


library(ggplot2)
library(tidyverse)

outcome_var_list = mtcars %>% select(qsec,hp) %>% names()
var_list = mtcars %>% select(cyl,wt) %>% names()


iterate <- sapply(outcome_var_list,function(k){

outcome_var_list = outcome_var_list[outcome_var_list == k] 
  
test= mtcars %>%
  select(all_of(c(outcome_var_list, var_list)), drat)%>%
  pivot_longer(all_of(var_list))%>%
  reframe(broom::tidy(lm(as.matrix(pick(all_of(outcome_var_list)))~
               name/factor(drat):value + 0, data = .)))%>%
  select(term, estimate)%>%
  mutate(color = str_extract(term, "(?<=name)[^:]+"),
         xvar = as.numeric(str_extract(term, "[0-9.]+")))%>%
  drop_na()



test %>%  ggplot(aes(x=xvar, y=estimate, color = color)) + geom_line() + theme_light() + ggtitle(label = paste0("Model 1: ",k)) + theme(plot.title = element_text(hjust = 0.5))

ggsave(file=paste0("file_",k,".png"), width = 19, height = 7, units = "cm",path= "descriptive_1/test")

})

This generates plots such as the following:

enter image description here

Which plots the estimate of the interaction effect between the variable of interest and every factor level of "drat". Each separate plot repeats this for different outcome variables as listed.

I now want to slightly modify this to (presumably) use the marginaleffects package. I want to perform a similar function as above (evaluating and plotting each interaction within group for every outcome variable), but I need this to be slightly different. I now aim to plot the slope of the interaction effect (by subtracting the marginal effects) for each interaction with drat. I believe this can be done using the plot_slopes function, and setting factor levels of "drat" as the "condition" argument and the interaction with the var_list variable. However, I am unsure how to iterate this function similar to the above function. I also do not know how to plot these slopes together on the same ggplot2 plot as above.

Additionally, instead of plotting a continuous line, I would like to plot full confidence intervals, (lower and upper bound with point estimate) either connected with a line and shaded areas, or disconnected.

Is it possible to iterate over the plotting of the slopes similar to my previous example? Is this the right case for the marginaleffects package, or is there a base R/alternative solution that is more ideally suited to it? I also assume that to properly calculate these slopes I must switch from the : in the interaction term to * plot both the uninteracted effects in order to find the true slope. Is that the case?

Update

I see that this is indeed possible with marginaleffects, at least for manually plotting the interactions. For example, I can do the following:


library(marginaleffects)
lm1 = lm(data = mtcars, formula = hp ~ cyl*factor(drat))
lm2 = lm(data = mtcars, formula = hp ~ wt*factor(drat))
p1_data = plot_slopes(lm1, variables = "cyl", condition = "drat", draw = FALSE) %>% select(-cyl)
p2_data = plot_slopes(lm2, variables = "wt", condition = "drat", draw = FALSE) %>% select(-wt)

rbind(p1_data,p2_data) %>% ggplot() + geom_point(aes(x = factor(drat), y = predicted, color = term)) + geom_errorbar(aes(ymin = predicted_lo, ymax = predicted_hi, x = factor(drat), color = term)) 


Which generates the following plot:

enter image description here

Due to the scale, the error bars cannot be seen, but this is the correct plotting. However, my goal is to automate this, as in the above code, such that instead of individually plotting every regression for every variable in var_list, they are automatically fed in and generate lines with different colors for each variable. Then, I would iterate this over every single variable in outcome_var_list to produce multiple plots.

flâneur
  • 633
  • 2
  • 8
  • I spent some time on this but am sorry to say that I just don't understand what you need. I also don't understand the code (with many tidyverse functions I am not familiar with). And I don't understand the exact quantities you want to extract. If you provide a truly minimal example which extracts the exact quantities you want -- not "slightly modified" quantities -- I'll try to come back and see if I understand (no promises, though). – Vincent Jul 15 '23 at 20:14
  • Now that's quite a bunch to throw at an unsuspecting crowd ;-) ... please take a moment to pick a minimal reproducible example: https://stackoverflow.com/help/minimal-reproducible-example, https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/ ;-) – I_O Jul 15 '23 at 20:22
  • Apologies for the ambiguity, as I myself was not 100% sure what I was trying to produce. I have updated the question with further code to clarify. Many thanks for any and all support! – flâneur Jul 15 '23 at 20:38

1 Answers1

1

Maybe just use a simple loop? To me, this seems easy and readable:

library(ggplot2)
library(marginaleffects)

outcome_var_list = c("qsec", "hp")
regressor_var_list = c("wt", "cyl")
results = list()
for (o in outcome_var_list) {
  for (r in regressor_var_list) {
    f = paste(o, "~", r, "* factor(drat)")
    m = lm(f, mtcars)
    s = plot_slopes(m, variables = r, condition = "drat", draw = FALSE)
    tmp = s[, c("estimate", "conf.low", "conf.high", "drat")]
    tmp$outcome = o
    tmp$regressor = r
    results = c(results, list(tmp))
  }
}
results = do.call("rbind", results)

# remove outlier for nicer plot
results = subset(results, estimate < 4000)

ggplot(results, aes(x = drat, y = estimate, ymin = conf.low, ymax = conf.high)) +
  geom_pointrange() +
  facet_wrap(~outcome + regressor)

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • 1
    Thank you so much, this approach works perfectly. I was clearly overcomplicating the matter! – flâneur Jul 16 '23 at 19:08
  • One more question (if you have the time). This plots the reference level (first level of drat in the interaction) as a value, of drat on the x axis. Can the estimate for this term be interpreted properly? If so, is it not actually the reference level? I suppose this is less a question of coding and more trying to understand the theory. This estimate is not actually displayed in the original regression dataframe (at least when ```summary()``` is run, so what does this value represent? – flâneur Jul 17 '23 at 15:36