0

I am using ggplot for my graphs, but when I try to put the legend it just does not appear. I don't know where is the mistake. I am trying to use the "scale_*_manual" function, but still it doesn't show the legend. Could you have a look?

Thanks!

ggplot(data = OD)+
  theme_light()+
  geom_line(aes(x=Days, y=Wildtype, group=1, color="darkorange2"), color="darkorange2", linetype="solid")+
  geom_point(aes(x=Days, y=Wildtype, group=1, color="darkorange2"),color="darkorange2", shape=15, size=1.5)+
  geom_errorbar(aes(x=Days, y=Wildtype, ymin=Wildtype-SD, ymax=Wildtype+SD),width=.2, position=position_dodge(0.05))+
  geom_line(aes(x=Days, y=Winter, group=1, color="cadetblue3"), color="cadetblue3", linetype="solid")+
  geom_point(aes(x=Days, y=Winter, group=1, color="cadetblue3"),color="cadetblue3", shape=15, size=1.5)+
  geom_errorbar(aes(x=Days, y=Winter, ymin=Winter-SD.1, ymax=Winter+SD.1),width=.2, position=position_dodge(0.05))+
  geom_line(aes(x=Days, y=Flagella_less, group=1, color="olivedrab3"), color="olivedrab3", linetype="solid")+
  geom_point(aes(x=Days, y=Flagella_less, group=1, color="olivedrab3"), color="olivedrab3", shape=15, size=1.5)+
  geom_errorbar(aes(x=Days, y=Flagella_less, ymin=Flagella_less-SD.2, ymax=Flagella_less+SD.2),width=.2, position=position_dodge(0.05))+
  labs(title="Growth curve",x="Days",y="OD750",color="Legend")+
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5,color="black",size=8),
        axis.text.y=element_text(angle=0,hjust=1,vjust=0.5,color="black",size=8),
        plot.title=element_text(hjust=0.5, size=12,face = "bold",margin = margin(t=0, r=10,b=10,l=10)),
        axis.title.y =element_text(size=10, margin=margin(t=0,r=10,b=0,l=0)),
        axis.title.x =element_text(size=10, margin=margin(t=10,r=10,b=0,l=0)),
        legend.position = "right")+
  scale_fill_discrete(name="Strain",breaks=c("Wildtype","Winter","Flagella_less"))




  
paula
  • 1
  • 1
  • 2
    Legends are made automatically when you map a column to an aesthetic. The easiest way is to transform your data into long rather than wide format--this will also greatly simplify your plotting code. [See this FAQ for a concise example](https://stackoverflow.com/q/3777174/903061)--see especially RubenLaguna's answer. If you need additional help, please share some sample data, e.g., `dput(OD[1:10, ])` to share a copy/pasteable version of the first 10 rows of your data, including all class and structure information. – Gregor Thomas Nov 10 '21 at 15:16
  • I have tried to look it up but I am really new at this and I don't really know how to make it right... Any source of how can I learn to convert the data to long format? And to understand what is the difference? – paula Nov 10 '21 at 15:31
  • 1
    [Here's the FAQ on reshaping data from wide to long](https://stackoverflow.com/q/2185252/903061). There are many approaches, I'd focus on the answer using the `tidyr` package. The [tidy data](https://r4ds.had.co.nz/tidy-data.html) chapter from R for Data Science is a very good and friendly introduction. – Gregor Thomas Nov 10 '21 at 15:41
  • Basically, you'll want your data to have a `strain` column that has values `"Wildtype", "Winter", "Flagella_less"` and a `value` column that has whatever values you are plotting and a single `SD` column. Then you'll be able to use `ggplot(data = OD, aes(x = Days, y = value, color = strain)) + geom_line() + geom_point(shape = 15, size = 1.5) + geom_errorbar(aes(ymin = value - SD, ymax = value + SD)) + scale_color_manual(name="Strain", values = c("Wildtype" = "darkorange2", "Winter" = "cadetblue3", "Flagella_less" = "olivedrab3"))`. – Gregor Thomas Nov 10 '21 at 16:29
  • Only 1 `geom_line` call, only 1 `geom_errobar` call, only 1 `geom_point` call only specify each color, shape, size etc. once. – Gregor Thomas Nov 10 '21 at 16:31
  • Thanks a lot gregor! Do you know how can I make the line appear in the graph? I can only get dots, and I would also like to change the background and the axis lines. Is that possible? – paula Nov 12 '21 at 09:50
  • Share some copy/pasteable example data and I'll take a look. – Gregor Thomas Nov 12 '21 at 13:18
  • I uploaded another question about it! I think you can already have a look! Thanks :) – paula Nov 16 '21 at 15:30

1 Answers1

0

OP. The comments to your question have already referenced some good approaches, although it seems you're still having some trouble to integrate these. I'll try to make it simple and complete here for the beginner programming in R and using ggplot2. It will be somewhat long, but if you just want to skip:

TL;DR - convert your dataframe to long format using tidyr::pivot_longer(), then assign the color= aesthetic to match the newly-created column for names and map y= to the newly-created column for values.

Example Data

First, let's create an example dataset that mimics your question's case. In the future, I would advise you do the same in your question or use any one of the built-in datasets from base R or ggplot2 (you can view them by typing data() into the console).

library(ggplot2)

# example dataset
wide_data <- data.frame(
  Days=rep(1:5, 3),
  Wildtype = c(100, 120, 124, 105, 108),
  Winter = c(80, 82, 83, 90, 84),
  Flagella_less = c(25, 84, 90, 110, 113))

# basic long plot
p <- ggplot(data=wide_data) +
  geom_line(aes(x=Days, y=Wildtype, color="red"), color="red") +
  geom_point(aes(x=Days, y=Wildtype, color="red"), color="red") +
  geom_line(aes(x=Days, y=Winter, color="blue"), color="blue") +
  geom_point(aes(x=Days, y=Winter, color="blue"), color="blue") +
  geom_line(aes(x=Days, y=Flagella_less, color="green3"), color="green3") +
  geom_point(aes(x=Days, y=Flagella_less, color="green3"), color="green3")
p

enter image description here

Placement of Aesthetic Arguments

There is one issue right from the start with the plot code for p above, which is that you have the general format: geom_*(aes(... color="red"), color="red...). It's a good opportunity to demonstrate first how the placement of the color= aesthetic changes things if it's placed inside the aes() function vs. outside the aes() function.

When color= is placed outside aes(), then the aesthetic assigned is drawn for the whole geom. So, if you specify color="red", then you get a red geom. If you specify size=4, you get a geom drawn at size 4, and so on.

When color= is placed inside aes(), then the characteristics of that particular aesthetic are mapped according to that assignment. So, geom_point(aes(color=columnA...)) would generate points and decide if the points share the same color based on whatever the value of columnA was in your dataset. If you assign a character, like color="darkred", this will not actually color the point red, but instead will assign each observation in your dataset to be kind of labeled as being part of the group that is called "darkred". Color will then be chosen based on the default color scale.

If you specify in both places (like you have), then color= outside aes() will override any mapping (same result as not mapping in aes() at all).

This can be quite useful in that when you place something inside the aes() function, the result of mapping will be to create a legend for that particular aesthetic.. .but again, the colors are not actually mapped correctly. Instead, the part assigned to color= inside aes() will be used as the label in the legend. See this example if we remove the assignment of color= outside of aes():

p_inside_aes <- ggplot(data=wide_data) +
  geom_line(aes(x=Days, y=Wildtype, color="red")) +
  geom_point(aes(x=Days, y=Wildtype, color="red")) +
  geom_line(aes(x=Days, y=Winter, color="blue")) +
  geom_point(aes(x=Days, y=Winter, color="blue")) +
  geom_line(aes(x=Days, y=Flagella_less, color="green3")) +
  geom_point(aes(x=Days, y=Flagella_less, color="green3"))
p_inside_aes

enter image description here

The legend is created, named according to the assignment, but the color is determined based on a default color option.

First (Bad) Solution

The above point leads to the first solution. This is to assign the label for your data within the geom calls to color inside of aes(), and then assign colors manually via scale_color_manual() if you are unhappy with the default coloring:

ggplot(data=wide_data) +
  geom_line(aes(x=Days, y=Wildtype, color="Wildtype")) +
  geom_point(aes(x=Days, y=Wildtype, color="Wildtype")) +
  geom_line(aes(x=Days, y=Winter, color="Winter")) +
  geom_point(aes(x=Days, y=Winter, color="Winter")) +
  geom_line(aes(x=Days, y=Flagella_less, color="Flagella_less")) +
  geom_point(aes(x=Days, y=Flagella_less, color="Flagella_less")) +
  scale_color_manual(values=c("Wildtype"="red", "Winter"="blue", "Flagella_less"="green3"))

enter image description here

This "solves" your problem, but it's a really bad approach to plotting and completely counter to the Principles of Tidy Data upon which the ggplot package was built.

The Best Solution - Pivot your Data

The problem with this approach would become apparent when you have multiple columns in your data. With only three columns of data, you can see how complicated the code is becoming. What about when you have 20 columns? 50? You can imagine the horror of typing out 100 lines of code for a simple plot.

The better way of thinking is to understand that each column is giving you values of a particular nature (your y axis values), and each column name is actually indicating something else (the "type" or whatever term you want to specify). What you actually want is to ask ggplot to plot the data where you assign columns for each aesthetic:

  • x = Days column
  • y = the value.. .here I guess "OD"? It's not a column that exists in your dataset, but is embedded... hidden... or spread among all the other columns
  • color = the type or name of the column. Here color is also not in one column, but spread out among the column names.

The best way to solve this is to gather all your spread out data into this more meaningful structure, which is called tidy data. Lucikly, it's easy to do this via code, as there are a number of functions that can do this sort of thing: reshape(), gather(), and melt() to name a few. Here I'll show you how to do this via pivot_longer():

long_data <- tidyr::pivot_longer(
  data=wide_data,
  cols=-Days,
  names_to="Type",
  values_to="OD")

We keep the Days column as is, but for everything else, we gather all the names of the columns into a new column called "Type" and we gather all the values in each of those columns into a new column called "OD".

Then, the plot code is way way simpler and easier to understand:

ggplot(long_data, aes(x=Days, y=OD, color=Type)) +
  geom_line() + geom_point() +
  scale_color_manual(values=c("Wildtype"="red", "Winter"="blue", "Flagella_less"="green3"))

Line-by-line, it's doing this:

  • In the primary ggplot() call, you associate the dataset, and assign aesthetics that are applied to every geom: x, y, and color in one place.
  • The second line just asks to draw two geoms... not 6 like before, since ggplot will know how to color and draw separate lines based on the mapping defined in aes().
  • The scale_color_manual() function is not strictly necessary, but if you must define what colors you want to use, you can do it this way. There are a few other methods to use to define colors automatically, or you can just accept the default color scheme. Up to you.

Here's the result:

enter image description here

Long answer, but hopefully quite clear now.

chemdork123
  • 12,369
  • 2
  • 16
  • 32
  • It worked, thanks a lot! But now I am wondering, how to put error bars in the plot. In my data I have other columns that are the calculated standard deviation for each of them, but I don't know how to represent them now with the new code. – paula Nov 12 '21 at 09:47