1

I did a CDP plot with stat_ecdf function, but I cannot add a legend. This is the code of the CDP plot:

ggplot()+
      stat_ecdf(data=df, aes(Apple), geom = "step", col ="red")+
      stat_ecdf(data=df, aes(Ocean), geom = "step", col ="blue")+
      stat_ecdf(data=df, aes(Tree), geom = "step", col ="green")+
      stat_ecdf(data=df, aes(Citron), geom = "step", col ="yellow")+
      stat_ecdf(data=df, aes(Sun), geom = "step", col ="orange")+
      labs(y="Cumulative Probability",x="Proportion of samples (>1 FIB per 100 mL)",col="Legend")+
      theme_classic()

The plot appears, but no legend. I would like to have a legend which describes that the red line = apple, blue line = ocean, etc. Thank you.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Fraency
  • 11
  • 1
  • Please edit your question as shown [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Dec 09 '20 at 12:13
  • Alright, but why do you want to do this without `aes()`? If you do for example `aes(Apple, col = "Apple")` in the first `stat_ecdf()`, it should give a legend automatically and you can figure out what the colours should be in `scale_colour_manual(values = c(...))`. – teunbrand Dec 09 '20 at 12:58

1 Answers1

1

ggplot works better with data in long format. You can reshape your data so that the x variables are in a single column and there is a column that distinguishes the colors then you can add the color column as an aesthetic in your plot. You want your data in this form

df = bind_rows(
  # apple df
  data.frame(x=runif(100,-10,10),col_column='Apple',
             stringsAsFactors = F),
  # ocean df
  data.frame(x=runif(100,0,20),col_column='Ocean',
             stringsAsFactors = F),
  # tree df
  data.frame(x=rnorm(100,mean=0,sd=2),col_column='Tree',
             stringsAsFactors = F)
)

head(df)

# x col_column
# 1 -3.3221018      Apple
# 2 -9.8157569      Apple
# 3  9.7496057      Apple
# 4 -8.4488035      Apple
# 5 -8.4584002      Apple
# 6  0.9613705      Apple

I can't actually see your data but based on how you're using it I assume it just has columns Apple, Ocean, Tree, etc. If so you can use the gather function from dplyr to reshape in long format.

df = dplyr::gather(df,key='col_column',value='x')

Then you can rearrange your ggplot call like this.

ggplot(data=df,mapping=aes(x,color=col_column)) +
  stat_ecdf(geom='step') +
  scale_color_manual(values=c('Apple'='red','Ocean'='blue','Tree'='green'))

example graphic

This keeps you from rewriting stat_ecdf for every column/color you want.

cookesd
  • 1,296
  • 1
  • 5
  • 6