2

Most of the time we run a regression with interactive terms, we are interested in a partial derivative. For example, consider the model below,

Model specification

If I am interested to know the effect of X1 on P(Y), or the partial derivative of X1 on P(Y), I need the following combination of coefficients:

Partial derivative of X1

Instead of calculating it by hand, I can use, for example, the lincom function in R to calculate linear combination of regression parameters. But I would like not only to know the numbers from calculations like this; I would like to plot them. The problem is, if I am using a R package to plot coefficients (e.g., coefplot) it plots the coefficients from my model, but with no option for linear combination of coefficients. Is there any way to combine the lincom function (or other function that calculates combination of parameter) with coefplot (or other coefficient plot packages with this option)?

Of course, in the example above I only consider the derivative of X1, and if I plot it I will have a plot with one dot and its confidence intervals only, but I would like to show in the plot the coefficients for the partial derivatives of X1, X2, and Z, as in the example below.

Coefficients plot (the one I have): Coefficients Plot

Combination of parameters or partial derivatives plot (the one I am trying to get): Combined Estimates

I discovered that Stata has a function that does what I am looking for, called "plotbeta." Does R have something similar?

Thiago
  • 173
  • 11
  • It seems like the graph you want would be an over-simplification of the effects. The first derivative of X1 depends on both X2 and Z, but you're looking for a single point. At what values of X2 and Z do you want the point to be calculated? – DaveArmstrong Feb 13 '21 at 19:07
  • I think I need to clarify here I want the first derivative and not the marginal effects (ME). ME is when you plot the effect of X1 based on different values of X2 or Z. But isn’t first derivative usually taken having “one-unit increasing” of your variables in mind? In this case, I will get one point with lower and upper bounds of the confidence intervals for the first derivative. Thanks for the comment. – Thiago Feb 14 '21 at 09:50
  • Yes, but as you show up above, the first derivative is itself a function of two variables - X2 and Z. This means that when a one-unit change in X1 will have a different effect depending on the values of X2 and Z. That's why I'm not sure what you want your plot to look like. – DaveArmstrong Feb 14 '21 at 18:47
  • It is easier than that, Dave. I just want to plot what we already do when calling, for example, command lincom (that computes point estimates, standard errors, t or z statistics, p-values, and CIs for linear combinations of coefficients after any estimation command). Precisely, after fitting a model and obtaining estimates for coefficients β1, β2,... ,βk, I want to plot the estimates for linear combinations of the βi, such as β1 + β2 + β3. In other words, I am interested in the combination of betas after the estimation, and not the values of the independent values before the estimation. – Thiago Feb 15 '21 at 09:30

1 Answers1

1

Here's a start. This defined a function called plotBeta(), the ... are arguments that get passed down to geom_text() for the estimate text.

plotBeta <- function(mod, confidence_level = .95, include_est=TRUE, which.terms=NULL, plot=TRUE, ...){
  require(glue)
  require(ggplot2)
  b <- coef(mod)
  mains <- grep("^[^:]*$", names(b), value=TRUE)
  mains.ind <- grep("^[^:]*$", names(b))
  if(!is.null(which.terms)){
    if(!(all(which.terms %in% mains)))stop("Not all terms in which.terms are in the model\n")
    ins <- match(which.terms, mains)
    mains <- mains[ins]
    mains.ind <- mains.ind[ins]
  }
  icept <- grep("Intercept", mains)
  if(length(icept) > 0){
    mains <- mains[-icept]
    mains.ind <- mains.ind[-icept]
  }
  if(inherits(mod, "lm") & !inherits(mod, "glm")){
    crit <- qt(1-(1-confidence_level)/2, mod$df.residual)
  }else{
    crit <- qnorm(1-(1-confidence_level)/2)
  }
  out.df <- NULL
  for(i in 1:length(mains)){
    others <- grep(glue("^{mains[i]}:"), names(b))
    others <- c(others, grep(glue(":{mains[i]}:"), names(b)))
    others <- c(others, grep(glue(":{mains[i]}$"), names(b)))
    all.inds <- c(mains.ind[i], others)
    ones <- rep(1, length(all.inds))
    est <- c(b[all.inds] %*% ones)
    se.est <- sqrt(c(ones %*% vcov(mod)[all.inds, all.inds] %*% ones))
    lower <- est - crit*se.est
    upper <- est + crit*se.est
    tmp <- data.frame(var = mains[i],  
                      lab = glue("dy/d{mains[i]} = {paste('B', all.inds, sep='', collapse=' + ')}"), 
                      labfac = i, 
                      est = est, 
                      se.est = se.est, 
                      lower = lower, 
                      upper=upper)
    tmp$est_text <- sprintf("%.2f (%.2f, %.2f)", tmp$est, tmp$lower, tmp$upper)
    out.df <- rbind(out.df, tmp)
  }
  out.df$labfac <- factor(out.df$labfac, labels=out.df$lab)
  if(!plot){
    return(out.df)
  }else{
    g <- ggplot(out.df, aes(x=est, y=labfac, xmin=lower, xmax=upper)) + 
      geom_vline(xintercept=0, lty=2, size=.25, col="gray50") + 
      geom_errorbarh(height=0) + 
      geom_point() + 
      ylab("") + xlab("Estimates Combined") + 
      theme_classic() 
    if(include_est){
      g <- g + geom_text(aes(label=est_text), vjust=0, ...)
    }
    g
  }
}

Here's an example with some made-up data:

set.seed(2101)
dat <- data.frame(
  X1 = rnorm(500), 
  X2 = rnorm(500), 
  Z = rnorm(500), 
  W = rnorm(500)
)
dat <- dat %>% 
  mutate(yhat = X1 - X2 + X1*X2 - X1*Z + .5*X2*Z - .75*X1*X2*Z + W, 
         y = yhat + rnorm(500, 0, 1.5))

mod <- lm(y ~ X1*X2*Z + W, data=dat)
plotBeta(mod, position=position_nudge(y=.1), size=3) + xlim(-2.5,2)

enter image description here

EDIT: comparing two models

Using the newly-added plot=FALSE, we can generate the data and then combine and plot.

mod <- lm(y ~ X1*X2*Z + W, data=dat)
p1 <- plotBeta(mod, plot=FALSE)
mod2 <- lm(y ~ X1*X2 + Z + W, data=dat)
p2 <- plotBeta(mod2, plot=FALSE)
p1 <- p1 %>% mutate(model = factor(1, levels=1:2, 
                                   labels=c("Model 1", "Model 2")))
p2 <- p2 %>% mutate(model = factor(2, levels=1:2, 
                                   labels=c("Model 1", "Model 2")))

p_both <- bind_rows(p1, p2)
p_both <- p_both %>% 
  arrange(var, model) %>% 
  mutate(labfac = factor(1:n(), labels=paste("dy/d", var, sep="")))

ggplot(p_both, aes(x=est, y=labfac, xmin=lower, xmax=upper)) + 
  geom_vline(xintercept=0, lty=2, size=.25, col="gray50") + 
  geom_linerange(position=position_nudge(y=c(-.1, .1))) + 
  geom_point(aes(shape=model), 
             position=position_nudge(y=c(-.1, .1))) + 
  geom_text(aes(label=est_text), vjust=0,
            position=position_nudge(y=c(-.2, .15))) + 
  scale_shape_manual(values=c(1,16)) + 
  ylab("") + xlab("Estimates Combined") + 
  theme_classic() 

enter image description here

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • That's awesome, Dave! Thank you so much. – Thiago Feb 15 '21 at 12:00
  • Dave, based on your example (i.e., keeping the same model specification), is it possible to plot the figure from the plotBeta you created showing only dy/DX2 and dy/DX1 in the figure? – Thiago Feb 15 '21 at 16:40
  • 1
    I updated the function so if you specify `which.terms=c("X1", "X2")` as an argument to the function it will just plot the X1 and X2 derivatives. – DaveArmstrong Feb 15 '21 at 17:48
  • Didn't work though. `plotBeta(mod, position=position_nudge(y=.1), size=3, which.terms=c("X1", "X2")) + xlim(-2.5,2) ` Error in plotBeta(mod, position = position_nudge(y = 0.1), size = 3, which.terms = c("X1", : object 'inds' not found` – Thiago Feb 16 '21 at 10:24
  • @Thiago Sorry, there was a typo in the code I posted. It should work now. – DaveArmstrong Feb 16 '21 at 12:15
  • Great! Thank you so much, Dave. Last one: Is it possible to have the terms of interest in the order I assign them in "which.terms" parameter? For example, if I call `which.terms=c("X1", "W")` then in the plot generated dy/dX1 will be the first estimate in the plot (the top line), and dy/dW will be the second one (the bottom line)? – Thiago Feb 16 '21 at 13:19
  • 1
    @Thiago I edited the function in the answer to produce the desired result. – DaveArmstrong Feb 16 '21 at 14:08
  • @DaveAmstrong, if I have the same parameters but from another model, is it possible to add the estimate points and lines (with the coefficient and confidence intervals) in dashed line type and open dot below each of the estimates I already have, and add a legend with Model 1 and Model 2. I mean, using your example, instead of having only mod, I could have "mod1" and "mod2". Then I would generate plot with plotbeta, but now with the same parameters from two models (to compare them). Does this make sense? – Thiago Feb 24 '21 at 13:55
  • 1
    @Thiago I added an option (with `plot=FALSE`) to return the data that is being generated to make the plot. I think that it would be easier to just run `plotBeta()` on two different models, save the data and append the two data sets to each other then make the plot by hand rather than try to have the function anticipate all of the different possibilities. I'll add an example that does that, too. – DaveArmstrong Feb 24 '21 at 14:19
  • Fantastic! Thanks, Dave. – Thiago Feb 24 '21 at 14:58
  • Dave, one issue. I realized that the function plotBeta only allows me to calculate the combination of the constitutive terms. For example, after conducting the model `mod <- lm(y ~ [B1_cons] + X1 + X2 + Z + X1*X2 + X1*Z + X2*Z + X1*X2*Z + W, data=dat)` I can use plotBeta to plot the partial derivative of X1 (that would combine B2 + B6 + B7 + B9). However, if I am interested in a specific combination such as B2 + B6 only, I can’t do that. – Thiago Feb 24 '21 at 15:33
  • That's true, the function isn't designed to do that because that wasn't part of the original question. Some of the guts of the function above could be modified to do that, but it would really take a whole different function to accomplish this goal. – DaveArmstrong Feb 24 '21 at 16:14