0

I have a model with a factor and a continuous covariate, and an ANOVA indicates the main effects of both the factor and covariate are significant (P<0.05), but the interaction is not (P>0.05). The factor has two levels.

To report the results, I used emmeans to extract the model estimates across the range of the covariate, for both levels of the factor. From this I created a plot that showed a different slope for each level of the factor, while I stated in the text this difference in slopes was not significant. Here is a simple made up example:

x<-c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
y<-c(1,1.8,3,1.8,0.7,2,2.7,4,0.8,1.2,1.4,1.6,0.7,1.4,1.6,2.1)
f<-c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b")

df<-data.frame(x,f,y)

m<-lm(y~x*f)
anova(m)

plot.df<-data.frame(emmeans(m,~x*f,cov.reduce=F))

ggplot(plot.df,aes(x=x,y=emmean,colour=f,fill=f))+
  geom_line()+
  geom_ribbon(aes(ymin=lower.CL,ymax=upper.CL),colour=NA,alpha=0.2)

enter image description here

My colleague came back to me and said it is confusing to see different slopes in the plot when they are not significant in the ANOVA (in our real data the slope difference is bigger than my little example). I thought, okay then, I must be able to get the main effects averaged across the interactions, i.e. to plot the same slope at different intercepts for each factor level ... but I can't work out how to do this ... and now I am wondering if maybe it's not easy because it's not the right thing to do.

So I don't know if I need:

  • help with using emmeans (or a similar function) to extract the main effects only?
  • advice on whether it even makes sense to extract the main effects only? (and if not, what to do instead?)

I have tried the below but it makes no difference:

plot.df<-data.frame(emmeans(m,~x+f,cov.reduce=F))

Update: After a chat with a statistician colleague, I posed a similar question on how to do this with predict.lm(), without reference to emmeans or statistical validity. It turns out it is possible with predict.lm() (and for what it's worth, my stats colleague sees no problem with the concept): How to edit interactions in model matrix used by predict.lm()?

corn_bunting
  • 349
  • 1
  • 11
  • 2
    Some of this question seems statistical to me. Opinions will vary if you ask about it on, e.g., Cross Validated. :) My 2 cents is that a big p-value does not indicate "no interaction", so if you were truly interested in the interaction scientifically you should leave it in and show different slopes. If the "true" model scientifically is parallel lines then a simple approach is to take the interaction out. You can also get an "averaged" slope via `emtrends()`, like: `emtrends(m, ~1, var = "x")`. That plus an intercept per group will allow you to draw lines (but no CI). – aosmith Jun 08 '21 at 22:33
  • Thanks @aosmith! I also lean toward showing the different slopes or doing a proper model selection procedure to simplify the model (and agree I should post on CV for more depth there) - but first wanted to make sure there wasn’t an obvious R script solution that everyone thought was completely fine. The emtrends function may be handy though, cheers (I suppose estimate CIs could be obtained with some slightly fiddly calculating from the CI of the slope…) – corn_bunting Jun 09 '21 at 06:57
  • 1
    I do think it would make sense in general to be able to calculate a main effect in the presence of a continuousXcategorical interaction like we do with categoricalXcategorical. My impression is that SE's are mathematically more difficult (but haven't thought about it recently) and I've wondered if a bootstrap-based CI would be sufficient to show the uncertainty around the fitted line. – aosmith Jun 09 '21 at 15:11
  • 1
    Just to note I tried using the slope from `emtrends(m, ~1, var = "x")` to see if I could draw the lines, but can't work out how to get the right intercepts... (but anyway I do now think simplifying the model first is a better approach) – corn_bunting Jun 10 '21 at 16:15

1 Answers1

1

Statistical details may be better placed at CrossValidated, but if it's only the plot of main effect, just remove the interaction term from the model m:

library("ggplot2")
library("emmeans")

df <- data.frame(
  x = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
  y = c(1, 1.8, 3, 1.8, 0.7, 2, 2.7, 4, 0.8, 1.2, 1.4, 1.6, 0.7, 1.4, 1.6, 2.1),
  f = c("a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
        "b", "b", "b", "b")
)

m <- lm(y ~ x + f, data = df)
anova(m)

plot.df <- data.frame(emmeans(m,  ~x + f,  cov.reduce = FALSE))

ggplot(plot.df, aes(x = x, y = emmean, colour = f, fill = f)) +
  geom_line() +
  geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), colour = NA, alpha = 0.2)
tpetzoldt
  • 5,338
  • 2
  • 12
  • 29
  • Thanks – this is practical for this example :-) In the case of a more complex full model though (our real one has three-way interactions) I think a more formal stepwise selection approach should be used to get to the simple model rather than just dropping all the non-significant interactions at once. But yes, that’s more of a CV topic! – corn_bunting Jun 09 '21 at 07:06
  • 1
    I agree. The answer concerns just the visualization of the model without interaction. It can of course be combined with a model selection approach. In my experience, proper visualisation is a key to understand what the analysis does, in particular in case of multiple interactions. – tpetzoldt Jun 09 '21 at 09:09