Create a legend for different models and datasets on same ggplot2

Question

I am plotting two regressions models with the same independent variables and different independent variables in the same graph using ggplot2. I would like to add a legend that corresponds to each different dependent variable. Other posts have suggested using the melt() function to do this, however, it is unclear how to do these while preserving the model output. For example:

require(ggplot2)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100))
dat1 <- dat[,c(1,2,3)]
dat2 <- dat[,c(1,2,4)]

mod1 <- lm(y1 ~ x + z, data = dat1)
mod2 <- lm(y2 ~ x + z, data = dat2)


dat1$mod1 <- predict(mod1, newdata =dat1)  
err <- predict(mod1, newdata =dat1, se = TRUE)   
dat1$ucl <- err$fit + 1.96 * err$se.fit
dat1$lcl <- err$fit - 1.96 * err$se.fit   


dat2$mod2 <- predict(mod2, newdata =dat2)  
err <- predict(mod2, newdata =dat2, se = TRUE)   
dat2$ucl <- err$fit + 1.96 * err$se.fit
dat2$lcl <- err$fit - 1.96 * err$se.fit   


ggplot(dat1) + 
        geom_point(aes(x = x, y = mod1), size = .8, colour = "black") +
        geom_smooth(data = dat1, aes(x= x, y = mod1, ymin = lcl, ymax = ucl), 
                    size = 1, colour = "darkblue", se = TRUE, stat = "smooth", 
                    method = "lm") + 

        geom_point(data = dat2, aes(x = x, y = mod2), size = .8, colour = "black") +
        geom_smooth(data = dat2, aes(x= x, y = mod2, ymin = lcl, ymax = ucl), 
                    size = 1, colour = "darkred", se = TRUE, stat = "smooth", 
                    method = "lm") + 
        scale_colour_manual(values = c("y1" = "darkred", "y2" = "red" ))

Any thoughts on how to do this? Thanks.

Your call to `ggplot` uses 3 data frames (dat, dat1, dat2) but your code creates only `dat`. The answer will likely involve `melt()`, as you noted, but it will be easier to help you once you fix the data creation code to correspond to the data frames referenced by `ggplot`. — eipi10, Oct 15 '15 at 17:40

jeremycg · Accepted Answer · 2015-10-15T17:57:10.290

Here's how I would do it. First we gather everything (this is the tidyr melt) and then plot it out. The key here is using group = model to separate out the data. Given the data has been edited, we first need to recombine it like it was:

dat <- dat1
dat$mod2 <- dat2$mod2

Then we use tidyr to turn it into long data:

library(tidyr)
dat1 <- dat %>% gather(model, val, mod1, mod2)

Then we can plot (note the rearranging of aes, you can call it once and then inherit):

library(ggplot2)
ggplot(dat1, aes(x = x, y = val, group = model)) + 
  geom_point( size = .8, colour = "black") +
  geom_smooth(aes(col=model), size = 1, se = TRUE, stat = "smooth", method = "lm")

As a side note, I'm not sure what you think you are doing with ymax and ymin inside geom_smooth - They have no effect here, the errors are calculated by geom_smooth. Maybe you want geom_ribbon to use your own errors?

Thanks, this is great. Out of curiosity, is there any way to just add it manually without melting all of the models together first? — coding_heart, Oct 15 '15 at 18:32
not [nearly as easily](https://stackoverflow.com/questions/17148679/ggplot2-need-to-construct-a-manual-legend-for-complicated-plot) — jeremycg, Oct 15 '15 at 18:35

Create a legend for different models and datasets on same ggplot2

1 Answers1