1

I'm struggling to learn the ins and outs of R, ggplot2, etc - being more used to being taught in an A to Z manner an entire (fixed) coding language (not used to open source - I learned to code when dinosaurs roamed the earth). So I have kluged together the following code to create one graph. Only ... I don't have the dupe legends problem -- I have no legend a'tall!

erc <- ggplot(usedcarval, aes(x = usedcarval$age))   +
  geom_line(aes(y = usedcarval$dealer), colour = "orange", size = .5) +
  geom_point(aes(y = usedcarval$dealer), 
             show.legend = TRUE, colour = "orange", size = 1) +
  geom_line(aes(y = usedcarval$pvtsell), colour = "green", size = .5) +
  geom_point(aes(y = usedcarval$pvtsell), colour = "green", size = 1) +
  geom_line(aes(y = usedcarval$tradein), colour = "blue", size = .5) +
  geom_point(aes(y = usedcarval$tradein), colour = "blue", size = 1) +
  geom_line(aes(y = as.integer(predvalt)), colour = "gray", size = 1) +
  geom_line(aes(y = as.integer(predvalp)), colour = "gray", size = 1) + 
  geom_line(aes(y = as.integer(predvald)), colour = "gray", size = 1) +
  labs(x = "Value of a Used Car as it Ages (Years)", y = "Dollars") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 60, vjust = .6)) 
erc 

I can't figure out how to put an image in this text since I have no link except to my dropbox...

I would appreciate any help. Sincerely, Stephanie

  • This will probably help: http://stackoverflow.com/questions/18394391/r-custom-legend-for-multiple-layer-ggplot (perhaps there's a better duplicate out there) – MrFlick Mar 15 '17 at 17:42
  • This looks like it could be a "canonical" duplicate: http://stackoverflow.com/questions/5027016/missing-legend-with-ggplot2-and-geom-line. – eipi10 Mar 15 '17 at 17:49
  • 1
    In relation to the issues you're having, the basic ideas with ggplot are (1) your data should be in "long" format (which will also mean you need only one `geom_line`, `geom_point`, etc.), (2) to map a column to color (or fill, shape, etc.), put it inside `aes()` (this will also generate a legend),... – eipi10 Mar 15 '17 at 17:55
  • 1
    ... and (3) don't repeat the name of the data frame when referring to these mapped columns (in other words, `tradein` instead of `usedcarval$tradein`), because you've already told `ggplot` to use the data frame `usedcarval` in the initial call to `ggplot`. See the links to the duplicate questions for more detail. – eipi10 Mar 15 '17 at 17:58
  • Start with something simpler (fewer series), with fake data that you generate with `rnorm`, etc., and try to get the plot you want. If that doesn't work, post that question with the reproducible example that you then have. An example close to what you really want. The problem is that your approach here is simply not "ggplot"-ish. ggplot is strange, but powerful, and takes a lot of getting used to. – Mike Wise Mar 15 '17 at 18:11
  • 1
    Possible duplicate of [Missing legend with ggplot2 and geom\_line](http://stackoverflow.com/questions/5027016/missing-legend-with-ggplot2-and-geom-line) – erc Mar 15 '17 at 18:26

2 Answers2

2

Ok, I felt like doing some ggplot, and it was an interesting task to contrast the way ggplot-beginners (I was one not so long ago) approach it compared to the way you need to do it to get things like legends.

Here is the code:

library(ggplot2)
library(gridExtra)
library(tidyr)

# fake up some data
n <- 100
dealer <- 12000 + rnorm(n,0,100)
age <- 10 + rnorm(n,3)
pvtsell <- 10000 + rnorm(n,0,300)
tradein <- 5000 + rnorm(n,0,100)
predvalt <- 6000 + rnorm(n,0,120)
predvalp <- 7000 + rnorm(n,0,100)
predvald <- 8000 + rnorm(n,0,100)
usedcarval <- data.frame(dealer=dealer,age=age,pvtsell=pvtsell,tradein=tradein,
                        predvalt=predvalt,predvalp=predvalp,predvald=predvald)

# The ggplot-naive way
erc <- ggplot(usedcarval, aes(x = usedcarval$age))   +
  geom_line(aes(y = usedcarval$dealer), colour = "orange", size = .5) +
  geom_point(aes(y = usedcarval$dealer), 
             show.legend = TRUE, colour = "orange", size = 1) +
  geom_line(aes(y = usedcarval$pvtsell), colour = "green", size = .5) +
  geom_point(aes(y = usedcarval$pvtsell), colour = "green", size = 1) +
  geom_line(aes(y = usedcarval$tradein), colour = "blue", size = .5) +
  geom_point(aes(y = usedcarval$tradein), colour = "blue", size = 1) +
  geom_line(aes(y = as.integer(predvalt)), colour = "gray", size = 1) +
  geom_line(aes(y = as.integer(predvalp)), colour = "gray", size = 1) + 
  geom_line(aes(y = as.integer(predvald)), colour = "gray", size = 1) +
  labs(x = "ggplot naive way - Value of a Used Car as it Ages (Years)", y = "Dollars") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 60, vjust = .6)) 

# The tidyverse way
#    ggplot needs long data, not wide data. 
#    Also we have two different sets of data for points and lines

gdf <- usedcarval %>% gather(series,value,-age)
pdf <- gdf %>% filter( series %in% c("dealer","pvtsell","tradein"))

# our color and size lookup tables
clrs = c("dealer"="orange","pvtsell"="green","tradein"="blue","predvalt"="gray","predvalp"="gray","predvald"="gray")
szes = c("dealer"=0.5,"pvtsell"=0.0,"tradein"=0.5,"predvalt"=1,"predvalp"=1,"predvald"=1)

trc <- ggplot(gdf,aes(x=age)) + geom_line(aes(y=value,color=series,size=series)) + 
  scale_color_manual(values=clrs) +
  scale_size_manual(values=szes) +
  geom_point(data=pdf,aes(x=age,y=value,color=series),size=1) + 
  labs(x = "tidyverse way - Value of a Used Car as it Ages (Years)", y = "Dollars") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 60, vjust = .6)) 

grid.arrange(erc, trc, ncol=1)

enter image description here

Study it, espeically look at gdf,pdf and gather. You just can't get legends without using "long data".

If you want more information on the "tidyverse", start here: Hadley Wickham's tidyverse

Mike Wise
  • 22,131
  • 8
  • 81
  • 104
  • Thank you Mike et al for all the kind & understanding answers, especially the code examples. I'm sorry I missed the previous duplicate questions in my search. This is probably another dumb question, but what is "tidyverse"? Also, I have never seen "gather" nor "series" before. Oh how I long for one indexed reference manual! Sigh. – Stephanie Moore Mar 16 '17 at 02:04
  • Could you please accept the answer. And then maybe upvote it :) ? – Mike Wise Mar 24 '17 at 16:30
  • 1
    Mike, I tried to accept both your answers but it will only let me accept 1. So I accepted the one you asked me to. I just looked up "upvoting" so I will try to do that in a second. Sorry. Ignorant here and much pressed for time. – Stephanie Moore Mar 25 '17 at 18:10
1

If you are looking for a short example of how to take some series data that comes in wide format, convert it to long format (using gather), and then plot it with a ggplot (with a legend), here is a nice short example I cooked up for someone recently:

library(ggplot2)
library(tidyr)

# womp up some fake news (uhh... data)
x <- seq(-pi,pi,by=0.25)
y <- sin(x)
yhat <- sin(x) + 0.4*rnorm(length(x))

# This is the data in wide form 
#      you will never get ggplot to make a legend for it
#     it simply hates wide data
df1 <- data.frame(x=x,y=y,yhat=yhat)

# So we use gather from tidyr to make it into long data
#      creates two new colums, throws y and yhat in them, and replicates x as needed
#       you have to look at the data frame to understand gather,
#       and read the docs a few times
df2 <- gather(df1,series,value,-x)   

# it is now in long form and we can plot it
ggplot(df2) + geom_line(aes(x,value,color=series))

So here is the plot:

enter image description here

Mike Wise
  • 22,131
  • 8
  • 81
  • 104