0

I am plotting two regressions models with the same independent variables and different independent variables in the same graph using ggplot2. I would like to add a legend that corresponds to each different dependent variable. Other posts have suggested using the melt() function to do this, however, it is unclear how to do these while preserving the model output. For example:

require(ggplot2)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100))
dat1 <- dat[,c(1,2,3)]
dat2 <- dat[,c(1,2,4)]

mod1 <- lm(y1 ~ x + z, data = dat1)
mod2 <- lm(y2 ~ x + z, data = dat2)


dat1$mod1 <- predict(mod1, newdata =dat1)  
err <- predict(mod1, newdata =dat1, se = TRUE)   
dat1$ucl <- err$fit + 1.96 * err$se.fit
dat1$lcl <- err$fit - 1.96 * err$se.fit   


dat2$mod2 <- predict(mod2, newdata =dat2)  
err <- predict(mod2, newdata =dat2, se = TRUE)   
dat2$ucl <- err$fit + 1.96 * err$se.fit
dat2$lcl <- err$fit - 1.96 * err$se.fit   


ggplot(dat1) + 
        geom_point(aes(x = x, y = mod1), size = .8, colour = "black") +
        geom_smooth(data = dat1, aes(x= x, y = mod1, ymin = lcl, ymax = ucl), 
                    size = 1, colour = "darkblue", se = TRUE, stat = "smooth", 
                    method = "lm") + 

        geom_point(data = dat2, aes(x = x, y = mod2), size = .8, colour = "black") +
        geom_smooth(data = dat2, aes(x= x, y = mod2, ymin = lcl, ymax = ucl), 
                    size = 1, colour = "darkred", se = TRUE, stat = "smooth", 
                    method = "lm") + 
        scale_colour_manual(values = c("y1" = "darkred", "y2" = "red" )) 

Any thoughts on how to do this? Thanks.

coding_heart
  • 1,245
  • 3
  • 25
  • 46
  • Your call to `ggplot` uses 3 data frames (dat, dat1, dat2) but your code creates only `dat`. The answer will likely involve `melt()`, as you noted, but it will be easier to help you once you fix the data creation code to correspond to the data frames referenced by `ggplot`. – eipi10 Oct 15 '15 at 17:40
  • Apologies; appropriately edited. – coding_heart Oct 15 '15 at 17:44

1 Answers1

4

Here's how I would do it. First we gather everything (this is the tidyr melt) and then plot it out. The key here is using group = model to separate out the data. Given the data has been edited, we first need to recombine it like it was:

dat <- dat1
dat$mod2 <- dat2$mod2

Then we use tidyr to turn it into long data:

library(tidyr)
dat1 <- dat %>% gather(model, val, mod1, mod2)

Then we can plot (note the rearranging of aes, you can call it once and then inherit):

library(ggplot2)
ggplot(dat1, aes(x = x, y = val, group = model)) + 
  geom_point( size = .8, colour = "black") +
  geom_smooth(aes(col=model), size = 1, se = TRUE, stat = "smooth", method = "lm")

As a side note, I'm not sure what you think you are doing with ymax and ymin inside geom_smooth - They have no effect here, the errors are calculated by geom_smooth. Maybe you want geom_ribbon to use your own errors?

enter image description here

jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • Thanks, this is great. Out of curiosity, is there any way to just add it manually without melting all of the models together first? – coding_heart Oct 15 '15 at 18:32
  • 1
    not [nearly as easily](https://stackoverflow.com/questions/17148679/ggplot2-need-to-construct-a-manual-legend-for-complicated-plot) – jeremycg Oct 15 '15 at 18:35