Manually setting the legend color and text for multi-layer plot

Question

I have been dabbling with r on and off for the past 2 years, and only recently began using ggplot for building graphs. I am stuck with my situation described below:

I have a multilayer plot, where all data is in the same dataframe referenced in the initial ggplot call.

Layer 1 = Points
Layer 2 = Avg trend
Layer 3 = Derivative of the trend
Layer 4&5 = Prediction interval of the Points given the average trend

What I want to do is manually adjust the legend text to reflect the names and the colors/geom for each layer (layer4&5 can be one reference). When I to do this the legend for the geom_lines all come back as red.

The other issue is that I want to rename the gradient scale, but when I do the gradient turns to discrete points rather than a bar.

ggplot(D_ff_NWn,aes(NW_norm,FF_Det,color = CYC))+
  geom_point()+
  scale_color_gradient(guide = guide_legend(title = "Feeder Cycle"))+
  geom_line(aes(NW_norm,FF10.fit,fill="black"), color="black", show.legend = TRUE) +
  geom_line(aes(NW_norm,Diff1*SCL_rg+SCL_FF[1],fill="red"), 
            color="red", show.legend = TRUE)+
  geom_line(aes(NW_norm,FF_UCL,fill="Prediction"),color="green")+
  geom_line(aes(NW_norm,FF_LCL),color="green")+
  labs(x = "Normalized Net Weight (%)")+
  scale_y_continuous("Feed Factor (g/rev)", 
                     sec.axis = sec_axis(~ (. - SCL_FF[1])/SCL_rg, 
                                         name = "1st Derivative ([g/rev]/%)"))+
  scale_fill_manual(name="",
                    labels = c("Avg FF (g/min)", "1st Derivative","95% Prediction"),
                    values = c("black","red","green"))+ 
  theme(axis.text.y.right = element_text(color = "red"), 
        axis.title.y.right = element_text(color = "red"))

So in summary, I would ideally like to have:

the gradient bar with my custom name got the first layer
a representative line for each layer with custom names

Note: The second y axis is scaled using SCL_FF and SCL_rg

I am sure I can't add enough data to generate the image shown above to this post, but the dataframe structure is shown below.

 'data.frame':  16141 obs. of  19 variables:
 $ key                : Factor w/ 6 levels "ATAB","CCNa",..: 1 1 1 1 1 1 1 1 
 1 1 ...
 $ Process_Time       : num  5.65 5.67 5.68 5.7 5.72 ...
 $ CONC_PCT           : num  32 32 31.8 31.7 31.6 ...
 $ STATE              : Factor w/ 4 levels "Blind","Gravimetric",..: 2 2 2 2 
 2 2 2 2 2 2 ...
 $ NW                 : num  1.16 1.15 1.15 1.15 1.15 ...
 $ SRW_SP             : num  56.7 56.4 56.3 56.2 56 ...
 $ FF                 : num  2.36 2.37 2.37 2.37 2.37 ...
 $ MF                 : num  8 7.98 7.95 7.93 7.9 ...
 $ CYC                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Max_Mass           : num  1.72 1.72 1.72 1.72 1.72 ...
 $ NW_norm            : num  0.673 0.672 0.671 0.67 0.668 ...
 $ FF_Det             : num  2.33 2.33 2.33 2.34 2.34 ...
 $ FF10.fit           : num  2.34 2.34 2.34 2.34 2.34 ...
 $ FF10.se.fit        : num  0.000121 0.000121 0.000121 0.000121 0.000121 
 ...
 $ FF10.residual.scale: num  0.00458 0.00458 0.00458 0.00458 0.00458 ...
 $ FF10.df            : num  16128 16128 16128 16128 16128 ...
 $ Diff1              : num  0.0363 0.0363 0.0361 0.0344 0.0323 ...
 $ FF_UCL             : num  2.35 2.35 2.35 2.35 2.35 ...
 $ FF_LCL             : num  2.34 2.34 2.34 2.34 2.34 ...

Is there something I am blatantly missing? I thought I had a decent understanding of how the ggplot layers worked.

Appreciate any help or guidance.

Update 07May2018

The solution described below from Z.Lin worked. I reversed the gradient bar labels and changed the "1st derivative" layer to point to hide the unwanted raster at the high end of the "Normalized Net Weight".

The new legend results in point/lines for all layers. Can this be adjusted. I don't understand why it must default to this
The other remaining issue is that the borders from using the shape detracts from the gradient bar map to the main plot. It might be ok as the main plot is a detrended version of the Feed Factor in the window plotted against "Normalized Net Weight" instead of "Process Time"

Updated Plot

ggplot(D_ff_NWn, aes(x = NW_norm))+

  geom_point(aes(y = FF_Det, fill = CYC), shape = 21,stroke = 0.1) +
  geom_point(aes(y = Diff1 * SCL_rg + SCL_FF[1], colour = "1st 
    Derivative"),size=0.5) +
  geom_line(aes(y = FF10.fit, colour = "Avg FF (g/min)")) +
  geom_line(aes(y = FF_UCL, colour = "95% Prediction")) +
  geom_line(aes(y = FF_LCL, colour = "95% Prediction")) +

  labs(x = "Normalized Net Weight (%)")+
  scale_y_continuous(name = "Feed Factor (g/rev)", 
                     sec.axis = sec_axis(~ (. - SCL_FF[1])/SCL_rg, 
                                         name = "1st Derivative ([g/rev]/%)")) +
  scale_fill_gradient(name = "Feeder Cycle",guide = guide_colourbar(reverse = 
                                                                      TRUE))+
  scale_colour_manual(name = "",
                      values = c("Avg FF (g/min)" = "black", 
                                 "1st Derivative" = "red",
                                 "95% Prediction" = "green"))+ 

  theme(axis.text.y.right = element_text(color = "red"), 
        axis.title.y.right = element_text(color = "red"))

This is very difficult to answer without a minimal reproducible example (i.e., include some sample data please! :)) — erocoar, May 05 '18 at 13:03
If you include your picture as a link, one of us here can help to edit & show it as an image in your question. O/w, as erocoar has already noted, please show some data. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) if you need further directions on how to do that. — Z.Lin, May 05 '18 at 13:04
I have added the graph link and the structure of the dataset, Is there any more I can do to get some expert help on this? — Anthony Tantuccio, May 07 '18 at 09:35

Z.Lin · Answer 1 · 2019-02-13T08:49:43.823

Without actual data to test, this is my best guess for what can work for you:

ggplot(D_ff_NWn, 
       aes(x = NW_norm))+

  geom_point(aes(y = FF_Det, fill = CYC), shape = 21, colour = alpha("black", 0)) +
  geom_line(aes(y = FF10.fit, colour = "Avg FF (g/min)")) +
  geom_line(aes(y = Diff1 * SCL_rg + SCL_FF[1], colour = "1st Derivative")) +
  geom_line(aes(y = FF_UCL, colour = "95% Prediction")) +
  geom_line(aes(y = FF_LCL, colour = "95% Prediction")) +

  labs(x = "Normalized Net Weight (%)")+
  scale_y_continuous(name = "Feed Factor (g/rev)", 
                     sec.axis = sec_axis(~ (. - SCL_FF[1])/SCL_rg, 
                                         name = "1st Derivative ([g/rev]/%)")) +
  scale_fill_gradient(name = "Feeder Cycle")+
  scale_colour_manual(name = "",
                      values = c("Avg FF (g/min)" = "black", 
                                 "1st Derivative" = "red",
                                 "95% Prediction" = "green")) + 

  theme(axis.text.y.right = element_text(color = "red"), 
        axis.title.y.right = element_text(color = "red"))

If this works, explanations are below for several issues I observed in your code:

Issue 1 - You want to specify different colour legends for the geom_point() & geom_line() layers

If you check ?geom_line, you can see that colour is listed among the aesthetics that it understands, but fill is not. This means that a fill = red (or some other colour) line in geom_line(aes(...)) would be ignored.

On the other hand, geom_point understands both colour and fill. The default point shape is coloured based on the colour aesthetic, but there are other shapes that are coloured based on the fill aesthetic, with their outlines based on the colour aesthetic instead.

(Image sourced from here. Shapes 21-25 accept fill for their colours, & colour for their outlines.)

To address issue 1, choose a shape from 21-25, set geom_point's colour using aes(fill = CYC), and leave aes(colour = something) for your respective geom_line layers.

Issue 2 - The order of the manually stated legend labels is probably wrong

I see you know that using something like aes(fill = "some value") in conjunction with scale_fill_manual() allows you to manually specify the aesthetic mapping, but the order of values in your code is:

aes(fill = "black") (layer 2)
aes(fill = "red") (layer 3)
aes(fill = "Prediction") (layer 4)

while the order in your manual scale is:

scale_fill_manual(name="", 
                  labels = c("Avg FF (g/min)", "1st Derivative","95% Prediction"),
                  values = c("black","red","green"))

Ignoring the colour vs. fill problem for the moment, we can see that the alphabetical order of values listed here is c("black", "Prediction", "red"). scale_fill_manual() has no way of knowing that Prediction is supposed to be mapped to green; it will simply take the list of values in alphabetical order, & match them sequentially to the labels / values listed herein.

To address issue 2, use a named vector for the values parameter in scale_XXX_manual instead.

(I also recommend using the labels as values in the various geom_line(aes(fill = "some value")) layers, and remove labels = c(...) from scale_XXX_manual all together. I think it's neater that way.)

Issue 3 - You explicitly asked for a colour legend in discrete points, rather than a colour bar

There are two different legend-related functions in ggplot2, guide_legend and guide_colourbar. The former creates a discrete scale, while the latter a continuous bar. By default, scale_XXX_gradient actually uses the colourbar option, but the line scale_color_gradient(guide = guide_legend(title = "Feeder Cycle")) overrode that.

To address issue 3, you can:

replace guide_legend(title = "some title") with guide_colourbar(title = "some title"), or;
simply leave it out all together, and specify the legend name with scale_XXX_gradient(name = "some title").

(minor) Issue 4 - Top level aesthetic mappings

In your code, every geom layer's aesthetic mappings begin with NW_norm for x, followed by a different variable for y.

Since x = NW_norm is common to all layers, it should be stated in the top level ggplot() call, and only there. All subsequent layers would inherit that, unless there's a local aes(x = some.other.variable) mapping to override it.

Since y = something is different for each layer, there's no need for it to be in the top level call. Every geom layer should have its own aesthetic mapping for y, named explicitly as aes(y = something, ...) rather than aes(something, ...)

Thank you so much. It worked. Only issue now is that the border used from shape = 21 hides the actual gradient as shown in the update inserted above. — Anthony Tantuccio, May 07 '18 at 18:34
@AnthonyTantuccio Try adding `colour = alpha("black", 0)` to the `geom_point` line? I'll update my solution. — Z.Lin, May 08 '18 at 01:09

Manually setting the legend color and text for multi-layer plot

1 Answers1