0

I do a Multiple Linear Regression in R, where I want to add a simple legend to a graph (ggplot). The legend should show the points and fitted lines with their corresponding colors. So far it works fine (without legend):

ggplot() +
  geom_point(aes(x = training_set$R.D.Spend, y = training_set$Profit),
             col = 'red') +
  geom_line(aes(x = training_set$R.D.Spend, y = predict(regressor, newdata = training_set)),
            col = 'blue') +
  geom_line(aes(x = training_set$R.D.Spend, y = predict(regressor_sig, newdata = training_set)),
            col = 'green') +
  ggtitle('Multiple Linear Regression (Training set)') +
  xlab('R.D.Spend [k$]') + 
  ylab('Profit of Venture [k$]')

enter image description here

How can I add a legend here most easily?

I tried the solutions from similar question, but did not succeed (add legend to ggplot2 | Add legend for multiple regression lines from different datasets to ggplot)

So, I appended my original model like this:

ggplot() +
  geom_point(aes(x = training_set$R.D.Spend, y = training_set$Profit),
             col = 'p1') +
  geom_line(aes(x = training_set$R.D.Spend, y = predict(regressor, newdata = training_set)),
            col = 'p2') +
  geom_line(aes(x = training_set$R.D.Spend, y = predict(regressor_sig, newdata = training_set)),
            col = 'p3') +
  scale_color_manual(
    name='My lines',
    values=c('blue', 'orangered', 'green')) +
  ggtitle('Multiple Linear Regression (Training set)') +
  xlab('R.D.Spend [k$]') + 
  ylab('Profit of Venture [k$]')

But here I am getting the error of "Unknown colour name: p1". which makes somewhat sense, as I do not define p1 above. How can I make the ggplot recognise my intended legend?

Wurschti
  • 23
  • 2
  • 9
  • 1
    The color statement has to be inside the `aes()`, e.g., `aes(x = training_set$R.D.Spend, y = training_set$Profit, color="p1")` – DaveArmstrong Feb 18 '21 at 22:43
  • Ahhh yes, that makes sense, thank you @DaveArmstrong - I literally spent hours trying a lot of different things, but did not realise it was within the wrong parantesis ().
    How can I define which color refers to which legend? right now it seems to take the reverse order of my geom() methods as default. That's okay with 2 or 3 lines - but impossible to track with more (e.g. 10 lines).
    – Wurschti Feb 18 '21 at 23:35
  • 1
    It should go in alphabetical order of the labels in the color aesthetic. – DaveArmstrong Feb 18 '21 at 23:50

1 Answers1

1

Move col into the aes and then you can set the color using scale_color_manual:

library(ggplot2)
set.seed(1)
x <- 1:30
y <- rnorm(30) + x

fit <- lm(y ~ x)
ggplot2::ggplot(data.frame(x, y)) + 
  geom_point(aes(x = x, y = y)) + 
  geom_line(aes(x = x, y = predict(fit), col = "Regression")) + 
  scale_color_manual(name = "My Lines",
                     values = c("blue"))

enter image description here

LMc
  • 12,577
  • 3
  • 31
  • 43
  • Thanks a lot @LMc - I will try next to link the lines of each geom() to the legend - so the colors will automatically fit. – Wurschti Feb 18 '21 at 23:39