4

I am plotting different models' prediction lines over some data points. I would like to get a legend indicating to which individual belongs each point colour and another legend indicating to which model belongs each line colour. Below I share a fake example for reproducibility:

set.seed(123)
df <- data.frame(Height =rnorm(500, mean=175, sd=15),
                 Weight =rnorm(500, mean=70, sd=20),
                 ID = rep(c("A","B","C","D"), (500/4)))

mod1 <- lmer(Height ~ Weight + (1|ID), df)
mod2 <- lmer(Height ~ poly(Weight,2) + (1|ID), df)

y.mod1 <- predict(mod1, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 1
y.mod2 <- predict(mod2, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 2

df <- cbind(df, y.mod1,y.mod2)
df <- as.data.frame(df)

head(df)

    Height   Weight ID   y.mod1   y.mod2
1 166.5929 57.96214  A 175.9819 175.4918
2 171.5473 50.12603  B 176.2844 176.3003
3 198.3806 90.53570  C 174.7241 174.7082
4 176.0576 85.02123  D 174.9371 174.5487
5 176.9393 39.81667  A 176.6825 177.7303
6 200.7260 68.09705  B 175.5905 174.8027

First I plot my data points:

Plot_a <- ggplot(df,aes(x=Weight, y=Height,colour=ID)) + 
  geom_point() +
  theme_bw() +
  guides(color=guide_legend(override.aes=list(fill=NA)))

Plot_a

enter image description here

Then, I add lines relative to the prediction models:

Plot_b <- Plot_a + 
  geom_line(data = df, aes(x=Weight, y=y.mod1,color='mod1'),show.legend = T) + 
  geom_line(data = df, aes(x=Weight, y=y.mod2,color='mod2'),show.legend = T) +
  guides(fill = guide_legend(override.aes = list(linetype = 0)),
         color=guide_legend(title=c("Model")))

Plot_b

enter image description here

Does anyone know why I am not getting two different legends, one titled Model and the other ID?

I would like to get this

enter image description here

Dekike
  • 1,264
  • 6
  • 17

2 Answers2

6

This type of problems generaly has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from long to wide format.

The plot layers become simpler, one geom_line is enough and there is no need for guideto override the aesthetics.

To customize the models' legend text, create a vector of legends, in this case with plotmath, in order to have math notation. And the colors are set manually too.

library(dplyr)
library(tidyr)
library(ggplot2)

model_labels <- c(expression(X^1), expression(X^2))

df %>%
  pivot_longer(
    cols = c(y.mod1, y.mod2),
    names_to = "Model",
    values_to = "Value"
  ) %>%
  ggplot(aes(Weight, Height)) +
  geom_point(aes(fill = ID), shape = 21) +
  geom_line(aes(y = Value, color = Model)) +
  scale_color_manual(labels = model_labels, 
                     values = c("coral", "coral4")) +
  theme_bw()

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • @Thanks Rui, can I ask you something? If I want to change names `y.mod1` and `y.mod2` to `X¹` and `X²`, should I change variables name in the beginning? Additionally, I have 5 lines in my real case, so I think the best will be to customize the colour of the lines. Do I do this with `scale_color_manual()`? Thanks for your time again. – Dekike Sep 12 '20 at 09:49
  • @Dekike Yes, that should be done with `scale_color_manual`. See the edit. The new colors are just an example. – Rui Barradas Sep 12 '20 at 10:02
5

The issue is that in ggplot2 each aesthetic can only have one scale and only one legend. As you are using only the color aes you get one legend. If you want multiple legends for the same aesthetic have a look at the ggnewscales package. Otherwise you have to make use of a second aesthetic.

My preferred approach would be similar to the one proposed by @RuiBarradas. However, to stick close to your approach this could be achieved like so:

  1. Instead of color map on linetype in your calls to geom_line.
  2. Set the colors for the lines as arguments, i.e. not inside aes.
  3. Make use of scale_linetype_manual to get solid lines for both models.
  4. Make use of guide_legend to fix the colors appearing in the legend
library(ggplot2)
library(lme4)
#> Loading required package: Matrix

set.seed(123)
df <- data.frame(Height =rnorm(500, mean=175, sd=15),
                 Weight =rnorm(500, mean=70, sd=20),
                 ID = rep(c("A","B","C","D"), (500/4)))

mod1 <- lmer(Height ~ Weight + (1|ID), df)
mod2 <- lmer(Height ~ poly(Weight,2) + (1|ID), df)

y.mod1 <- predict(mod1, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 1
y.mod2 <- predict(mod2, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 2

df <- cbind(df, y.mod1,y.mod2)
df <- as.data.frame(df)
Plot_a <- ggplot(df) + 
  geom_point(aes(x=Weight, y=Height, colour=ID)) +
  theme_bw() +
  guides(color=guide_legend(override.aes=list(fill=NA)))

line_colors <- scales::hue_pal()(2)
Plot_b <- Plot_a + 
  geom_line(aes(x=Weight, y=y.mod1, linetype = "mod1"), color = line_colors[1]) + 
  geom_line(aes(x=Weight, y=y.mod2, linetype = "mod2"), color = line_colors[2]) + 
  scale_linetype_manual(values = c(mod1 = "solid", mod2 = "solid")) +
  labs(color = "ID", linetype = "Model") +
  guides(linetype = guide_legend(override.aes = list(color = line_colors)))

Plot_b

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you very much @stefan, one (silly) doubt. If I want to name to the lines as `X¹` and `X²`, how should I do? I mean, If I change in your code `linetype = "mod1"` by `linetype = "X¹"`, then, I have to change `mod1 = "solid"` by `X¹ = "solid"`, but this last is not allowed. I can't use `X¹` without `""`. How could I solve this? – Dekike Sep 12 '20 at 09:58
  • 1
    The easiest approach to achieve this is by setting the labels inside `scale_linetype_manual` using the labels argument, i.e. `labels = c(mod1 = bquote(~X^1), mod2 = bquote(~X^2))` – stefan Sep 12 '20 at 10:03