1

The solution with ggplot in this question worked really well for my data. However, I am trying to add a legend and everything that I tried does not work...

For example, in the ggplot example in the above question, how I can add a legend to show that the red curve is related to "Ocean" and the green curve is related to "Soil"? Yes, I want to add text that I will define and it is not related to any other variable in my data.frame.

The example below is some of my own data...

Rate     Probability      Stats
1.0e-04    1e-04          891.15 
1.0e-05    1e-04          690
...

etc (it's about 400 rows). And I have two data frames similar to the above one. So My code is

g <- ggplot(Master1MY, aes(Probability))
g <- g + geom_point(aes(y=Master1MY$Stats), colour="red", size=1)
g <- g + geom_point(aes(y=Transposon1MY$Stats), colour="blue", size=1)
g + labs(title= "10,000bp and 1MY", x = "Probability", y = "Stats")

The plot looks likethis

I just want a red and blue legend saying "Master" and "Transposon"

Thanks!

Community
  • 1
  • 1
Fabs
  • 149
  • 2
  • 11

1 Answers1

5

In ggplot it is generally most convenient to keep the data in a 'long' format. Here I use the function melt from the reshape2 package to convert your data from wide to long format. Depending how you specify different aesthetics (size, shape, colour et c), corresponding legends will appear.

library(ggplot2)
library(reshape2)

# data from the example you were referring to, in a 'wide' format.
x  <- seq(-2, 2, 0.05)
ocean <- pnorm(x)
soil <- pnorm(x, 1, 1)
df <- data.frame(x, ocean, soil)

# melt the data to a long format
df2 <- melt(data = df, id.vars = "x")

# plot, using the aesthetics argument 'colour'
ggplot(data = df2, aes(x = x, y = value, colour = variable)) + geom_line()

enter image description here

Edit, set name and labels of legend

# Manually set name of the colour scale and labels for the different colours
ggplot(data = df2, aes(x = x, y = value, colour = variable)) +
 geom_line() +
 scale_colour_discrete(name = "Type of sample", labels = c("Sea water", "Soil"))

Edit2, following new sample data Convert your data, assuming its organization from your update, to a long format. Again, I believe you make your ggplot life easier if you keep your data in a long format. I relate every step with the simple example data which I used in my first answer. Please note that there are many alternative ways to rearrange your data. This is one way, based on the small (non-reproducible) parts of your data you provided in the update.

# x  <- seq(-2, 2, 0.05)
# Master1MY$Probability
Probability <- 1:100

# ocean <- pnorm(x)
# Master1MY$Stats
Master1MY <- rnorm(100, mean = 600, sd = 20)

# soil <- pnorm(x,1,1)
# Transposon1MY$Stats
Transposon1MY <- rnorm(100, mean = 100, sd = 10)

# df <- data.frame(x, ocean, soil)
df <- data.frame(Probability, Master1MY, Transposon1MY)

# df2 <- melt(df, id.var = "x")
df2 <- melt(df, id.var = "Probability")

# default
ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
  geom_point()

# change legend name and labels, see previous edit using 'scale_colour_discrete'

# set manual colours scale using 'scale_colour_manual'.

ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
  geom_point() +
  scale_colour_manual(values = c("red","blue"), name = "Type of sample", labels = c("Master", "Transposon"))

enter image description here

Henrik
  • 65,555
  • 14
  • 143
  • 159
  • 1
    It might be worth adding this example to the ggplot answer in the question linked above as well. For an answer with 12 upvotes, it's not really a great example of `ggplot` code. – joran Sep 26 '13 at 21:09
  • Good point. @Fabs was a bit unfortunate to stumble over a wide format example. – Henrik Sep 26 '13 at 21:15
  • Thanks! But I don't think this example will work for me now... (or maybe I didn't understand the example, and sorry about that)... I will edit my question and show part of my example... maybe there is a simple way of doing what I want to do .... thanks again! – Fabs Sep 26 '13 at 22:55
  • Again, I believe you make your `ggplot` life much easier if you keep your data in a data frame in a long format. Try to see how your different 'real' variables in your update correspond to the variables in the test data you referred to, and work your way to a (one) long data frame. I have added a small example. – Henrik Sep 26 '13 at 23:58
  • Hi Thanks! Now I understood and I could reproduce using my data. Thanks a lot again! – Fabs Sep 27 '13 at 02:08