0

I'm using a data frame in R with 3 variables. I want to plot (ggplot) 2 variables (CMod4X and CMod5X) as two distinct lines, in function of the 3th variable (AmtX). In the end I succeed in creating some kind of graph that suits me, but I fail to include a legend. I have already consulted some other treads here, but the answers don't seem not to work for me.

The (artificial) data set looks like this

AmtX <- seq(from = 1, to = 10001, by = 50)
CMod4X <- rnorm(201, mean = 0.87, sd = 0.01)
CMod5X <- rnorm(201, mean = 0.84, sd = 0.01)
EvalAmtX <- as.data.frame(cbind(AmtX,CMod4X,CMod5X))

I have made the plot like this

pltX <- ggplot(data = EvalAmtX, aes (x = AmtX)) + 
        geom_line(aes(y = CMod4X), color = "red", show.legend = TRUE) +
        geom_line(aes(y = CMod5X), color = "blue", show.legend = TRUE) +
        geom_smooth(aes(y = CMod4X), color = "red", se = FALSE, show.legend = TRUE) +
        geom_smooth(aes(y = CMod5X), color = "blue", se = FALSE, show.legend = TRUE) +
        labs(y = "C-index", x = "Amount (Tau)", title = "model 4 and model 5") +
        scale_colour_manual(name = "Models", values = c("CMod4" = "red", "CMod5" = "blue"))
pltX

But this plot won't include a label. I've included my plot below:

The plot I obtain can been seen here

What am I doing wrong and what must I do to obtain a plot telling me the red line is CMod4 and the blue line is CMod5?

Thx for your help!! Leonard

OTStats
  • 1,820
  • 1
  • 13
  • 22

1 Answers1

0

I guess you need to dive a little deeper into how ggplot2 works, since your question is related to the basic set up of your data frame. There are a lot of great resources around on this topic, e.g. this one. Anyway, here are two solutions for putting the legend into your graph.

Solution 1: Rearrange data frame to long format

library(reshape2)
df <- melt(data = EvalAmtX, id.vars = "AmtX")

The data frame now looks like this:

head(df)
# AmtX variable     value
# 1    1   CMod4X 0.8772716
# 2   51   CMod4X 0.8524197
# 3  101   CMod4X 0.8686019
# 4  151   CMod4X 0.8638835
# 5  201   CMod4X 0.8674627
# 6  251   CMod4X 0.8729925

Now, plotting is easy. Instead of telling ggplot2 the color of each individual line, you simply give it the information which column in your data frame contains the factor that should determine the color of the lines. So you add another aesthetic (col = variable). This also automatically adds a legend for color.

ggplot(df, aes(x=AmtX, y=value, col = variable)) +
  geom_line()

Solution 2: Use a manual color scale

You almost got it right in your code.

pltX <- ggplot(data = EvalAmtX, aes (x = AmtX)) + 
    geom_line(aes(y = CMod4X, color = "CMod4")) +
    geom_line(aes(y = CMod5X, color = "CMod5")) +
    geom_smooth(aes(y = CMod4X, color = "CMod4"), se = FALSE) +
    geom_smooth(aes(y = CMod5X, color = "CMod5"), se = FALSE) +
    labs(y = "C-index", x = "Amount (Tau)", title = "model 4 and model 5") +
    scale_colour_manual(name = "Models", values = c(CMod4 = "red", CMod5 = "blue"))
pltX
Rafor
  • 39
  • 4
  • Thanks @rafor ! That is indeed a nice and compact answer to my question. Changing the structure of the data was indeed the solution a have seen on some places. But I'm still wondering if it is possible to obtain a similar result without rearranging the data and keep the data frame as is? – Leonard Mar 05 '20 at 15:08
  • I edited by answer and added a second solution that does not involve rearranging the data. Hope this helps! – Rafor Mar 06 '20 at 20:00
  • Thx for your solution. It really helps!! – Leonard Mar 08 '20 at 11:52