0

I have data in a CSV file like this:

Year,A,B,C,D,E,F,G
2007,3.35,,,,,,
2008,3.54,3.59,,,,,
2009,3.22,3.46,4.43,,,,
2010,3.82,3.63,4.64,,,,
2011,2.91,3.74,4.5,4.13,4.38,,
2012,3.85,3.57,4.13,4,4,4,
2013,4.33,2.93,4.63,4.71,4.25,,
2014,4.73,4,4.81,4.66,4.33,,4
2015,,,4.89,4.68,,,

I'm trying to plot it like this:

scores_raw = read.csv("scores.csv", header = TRUE, fill = TRUE)

scores_melt <- melt(scores_raw, id = "Year")

scores_symb <- c(15, 17, 16, 16, 16, 16, 16)  

plot_scores <- ggplot(scores_melt, aes(x=Year, y=value, colour=variable, shape=variable))
plot_scores +
  geom_line() + 
  geom_point(size = 10, alpha = 0.6) + 
  scale_shape_manual(values = scores_symb, 
                     name="Cohort\nSize",
                     labels=c("200", "100", "25")) +
  ylab("Score (5 = max)") + 
  scale_y_continuous(limits = c(0, 5)) +
  theme_bw() +
  theme(
    text = element_text(size=30)
    , axis.title.y=element_text(vjust=1.5)
    , axis.title.x=element_text(vjust=0.1)
    , plot.background = element_rect(fill = "transparent",colour = NA)
    , legend.justification=c(0,0), legend.position=c(0,0) #legend.position="none"
    , legend.background = element_rect(fill="transparent", size=.5, linetype="dotted")
  )

As you can tell, I've got 7 series but only want to tell them apart by 3 cohort sizes (i.e. shape).

I would like the legend to only show the three shapes that discriminate the three types of data I've got. At the moment, I can either produce a single legend with shapes and colours combined. Or two legends (as in the code above) that produces two legends, one with shapes (4 of which are NAs) and the other with the colours.

Help please!

suknat
  • 335
  • 2
  • 7
  • 1
    [This](http://stackoverflow.com/questions/12410908/creating-a-ggplot-legend-with-both-color-and-shape) is relevant in terms of combining legends. It could even be a duplicate, except I'm not sure if you are trying for two legends or one. If combined, what what should the labels be? If two, seems easiest to me to make a new variable representing cohort size that only has three levels. – aosmith Nov 25 '15 at 23:12
  • If `colour` and `shape` are both mapped to `variable`, than how come they have a different number of levels? I agree, if you want the shapes to reflect cohort size, make a variable for cohort size and map to that. See [here](http://stackoverflow.com/questions/14604435/turning-off-some-legends-in-a-ggplot) for info on how to turn the legend for colour off. – Axeman Nov 26 '15 at 10:01
  • @aosmith - I did look at that post on combining legends before posting, but eventually gave up :) I want the legend to show cohort size only, but I also want all the series to be visible separately (hence the need for colours). – suknat Nov 30 '15 at 10:40
  • @Axeman - your idea to create a variable for cohort size sounds like it might work. Any pointers on how to? Thanks. – suknat Nov 30 '15 at 10:43

1 Answers1

1

You are trying to map cohort size to shape. Instead of hacking the scales, actually map cohort size to shape (and not variable). You can do this by creating a new variable called cohort_size.

Read in data

scores_raw <- read.table(text = "Year,A,B,C,D,E,F,G
2007,3.35,,,,,,
2008,3.54,3.59,,,,,
2009,3.22,3.46,4.43,,,,
2010,3.82,3.63,4.64,,,,
2011,2.91,3.74,4.5,4.13,4.38,,
2012,3.85,3.57,4.13,4,4,4,
2013,4.33,2.93,4.63,4.71,4.25,,
2014,4.73,4,4.81,4.66,4.33,,4
2015,,,4.89,4.68,,,", sep = ",", header = TRUE)

Melt and create the new variable

scores_symb <- c(15, 17, 16, 16, 16, 16, 16)
scores_melt <- reshape2::melt(scores_raw, id = "Year")
# Add the new variable
scores_melt$cohort_size <- scores_melt$variable
# Correctly map the levels
levels(scores_melt$cohort_size) <- scores_symb
# Reorder the levels (you might want to use lables = c(25, 100, 200) here)
scores_melt$cohort_size <- factor(scores_melt$cohort_size, levels = 15:17)

Create the plot

# Simplified ggplot call that looks ok on my screen
ggplot(scores_melt, aes(x = Year, y = value, colour = variable, shape = cohort_size)) +
  geom_line() +
  geom_point(size = 5, alpha = 0.6) +
  ylab("Score (5 = max)") +
  scale_colour_discrete(guide = FALSE) +
  theme_bw() +
  theme(legend.position=c(0.1, 0.8))

Result

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • 1
    Brilliant! In all my noodling with r and ggplot, I had never used `levels`. Always something new to learn! Thanks very much - you are a star! – suknat Nov 30 '15 at 19:35
  • Factors can be a pain, but sometimes they're really quite neat. Happy to help. – Axeman Nov 30 '15 at 20:18