-1

I'm new to ggplot2, and I'm currently using it to plot several lines on a graph of distance (x) against mortality rate (y) over several years. I can get the lines to display, but I'm trying to sort out the legend, which should display which colour represents which year. I've read many things about this but I can't seem to get the scale_fill_discrete command to change my legend accordingly. Here's my current code:

g <- ggplot(filtered, aes(x=filtered$distance)) + 
  geom_point(aes(y = filtered$RelativeDeaths.2014, color = "v"),size = 0.5) +  # basic graphical object
  geom_line(linetype = "solid", aes(y=filtered$RelativeDeaths.2014,color = "v")) +
  geom_point(aes(y = filtered$RelativeDeaths.2015,color = "x"),size = 0.5) +# first layer
  geom_line(linetype = "solid", aes(y=filtered$RelativeDeaths.2015,color = "x")) +
  geom_point(aes(y = filtered$RelativeDeaths.2016,color = "y"),size = 0.5) +
  geom_line(linetype = "solid", aes(y=filtered$RelativeDeaths.2016,color = "y")) +
  geom_point(aes(y = filtered$RelativeDeaths.2017,color = "z"),size = 0.5) +
  geom_line(linetype = "solid", aes(y=filtered$RelativeDeaths.2017,color = "z")) +
  scale_color_manual(
    values = c(v="red",x="blue",y="green",z="yellow"))+
        scale_fill_discrete(name = "Year", labels = c("2014", "2015", "2016","2017"))



g <- g + ylab("Relative Mortality Rates 2014 - 2017") + xlab("Distance To Canal") 

g

And heres a look at my filtered data frame with the relevant columns:

dput(head(filtered[cols], 20))
structure(list(distance = c(30.4493274665705, 32.690767619627, 
50.423978523969, 105.398975038182, 149.880076901593, 154.928665795813, 
178.886949742468, 197.37523391094, 200.977994666642, 201.635948013352, 
243.023605110627, 263.223206608342, 276.989624513379, 286.759943907289, 
291.861599835967, 292.419257603377, 292.463221848888, 309.224411286688, 
310.957457758306, 324.537645878657), RelativeDeaths.2014 = c(-5, 
-6, -5, -2, -4, -5, -2, -6, 5, -7, 2, -3, -5, -6, 6, -3, -4, 
-3, -5, -2), RelativeDeaths.2015 = c(-5, -5, -6, -2, -6, -7, 
-4, -2, 3, -4, 1, -3, -4, -5, -4, -7, -1, -8, -4, -3), RelativeDeaths.2016 = c(-3, 
-6, -2, -5, -3, -3, 2, -1, 2, -2, 1, -2, 4, 3, 2, 1, -5, -6, 
-4, -3), RelativeDeaths.2017 = c(-4, -6, -9, -5, -6, 0, -5, -3, 
-2, -7, -2, -1, -3, -1, 2, -1, -4, -4, -7, -5)), row.names = c(561L, 
562L, 599L, 606L, 563L, 709L, 594L, 603L, 598L, 612L, 572L, 597L, 
604L, 595L, 602L, 716L, 609L, 708L, 616L, 711L), class = "data.frame")

However, the values displayed in the legend are the variables I use for colour, x,x,y,z. Any help would be greatly appreciated!

  • Get rid of `filtered$` everywhere but in `ggplot(filtered, etc)`. – Rui Barradas Aug 18 '19 at 19:35
  • Can you post sample data? Please edit **the question** with the output of `dput(filtered)`. Or, if it is too big with the output of `dput(head(filtered, 20))`. – Rui Barradas Aug 18 '19 at 19:36
  • The output with both of those commands is far too big to post, I'm afraid. – Murray Ross Aug 18 '19 at 19:41
  • You only have 5 variables that matter, `distance` and `RelativeDeaths.2014` to `RelativeDeaths.2017`. If you post just those it should become (much) smaller. Is there a `colour` variable? Also, the problem seems to be a data format one, you have your data in wide format when it should be in long format. – Rui Barradas Aug 18 '19 at 19:46
  • How would I do that as a command? Sorry, I'm a relative newbie to R. Also, no there is no ```colour``` variable in the dataframe. – Murray Ross Aug 18 '19 at 19:50
  • 1
    First, `cols <- c("distance", "RelativeDeaths.2014", etc, "RelativeDeaths.2017")`. Then, `dput(head(filtered[cols], 20))` – Rui Barradas Aug 18 '19 at 19:55
  • Thanks, I've edited the main post accordingly. – Murray Ross Aug 18 '19 at 19:58

2 Answers2

1

As mentioned by Rui, above, you can drop all of the filtered$ in your variable selections.

You should also be able to do away with linetype=solid, which is the default for geom_line().

It may help to name the colors based on the year you're plotting (e.g. "2014" instead of "x" so that these names show up in the legend instead of the letters.

Finally, try splitting up the parameters in scale_color_manual(), using both breaks and values:

g <- ggplot(filtered) + 
  # 2014
  geom_point(aes(x = distance,
                 y = RelativeDeaths.2014, 
                 color = "2014"),
             size = 0.5) +  # basic graphical object
  geom_line(aes(x = distance,
                y = RelativeDeaths.2014,
                color = "2014")) +
  # 2015
  geom_point(aes(x = distance,
                 y = RelativeDeaths.2015,
                 color = "2015"),
             size = 0.5) +# first layer
  geom_line(aes(x = distance,
                y = RelativeDeaths.2015,
                color = "2015")) +
  # 2016
  geom_point(aes(x = distance,
                 y = RelativeDeaths.2016,
                 color = "2016"),
             size = 0.5) +
  geom_line(aes(x = distance,
                y = RelativeDeaths.2016,
                color = "2016")) +
  # 2017
  geom_point(aes(x = distance,
                 y = RelativeDeaths.2017,
                 color = "2017"),
             size = 0.5) +
  geom_line(aes(x = distance,
                y = RelativeDeaths.2017,
                color = "2017")) +
  scale_color_manual(breaks = c("2014",
                                "2015",
                                "2016",
                                "2017"),
                     values = c("red",
                                "blue",
                                "green",
                                "orange")) +
  ylab("Relative Mortality Rates 2014 - 2017") +
  xlab("Distance to Canal")

g
  • Thanks, your solution is much more elegant and gives me exactly what I wanted! Thanks again! – Murray Ross Aug 18 '19 at 20:37
  • Glad it worked! I recommend you check out Rui's answer post for reshaping the data. That's a great way to make the plotting code much cleaner and more succinct. – Lucas Graybuck Aug 18 '19 at 21:07
0

The main problem is a data format problem. The data is in a wide format, when ggplot works better with data in long format. See this question for many ways of solving this issue.

I will use package reshape2, function melt to reshape the data. Then, the plotting code becomes very simple, with just one call to each of geom_line and geom_point. And even the coloring code becomes simpler. A variable of the long format data set is the colour variable (ironically named variable).

library(ggplot2)

df_long <- reshape2::melt(filtered, id.vars = "distance")

ggplot(df_long, aes(distance, value, colour = variable)) +
  geom_line() +
  geom_point() +
  scale_color_manual(
    name = "Year", 
    labels = c("2014", "2015", "2016","2017"),
    values = c("red", "blue", "green", "yellow")) +
  ylab("Relative Mortality Rates 2014 - 2017") + 
  xlab("Distance To Canal")

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66