1

I've searched and tried a bunch of suggestions to be able to display a custom legend instead of the default one in a grouped scatter ggplot. I've tried this and this and following this among others.

For instance, let's say I have a df like this one:

df = data.frame(id = c("A", "A", "B", "C", "C", "C"), 
                value = c(1,2,1,2,3,4), 
                ref = c(1.5, 1.5, 1, 2,2,2), 
                min = c(0.5, 0.5, 1,2,2,2))

and I want to display the values of each id as round dots, but also put the reference values and minimum values for each id as a differently shaped dot, as follows:

p = ggplot(data = df) +
  geom_point(aes(x = id, y = value, color = factor(id)), shape = 19, size = 6) +
  geom_point(aes(x = id, y = ref, color = factor(id)), shape = 0, size = 8) +
  geom_point(aes(x = id, y = min, color = factor(id)), shape = 2, size = 8) +
  xlab("") +
  ylab("Value")
#print(p) 

Now all is fine, but my legend doesn't add anything to the interpretation of the plot, as the X axis and colors are enough to understand it. I know I can remove the legend via theme(legend.position = "none"). Instead, I would like to have a legend of what the actual shapes of each dot represent (e.g., filled round dot = value, triangle = min, square = ref).

Among trying to manually set the scale values via scale_fill_manual and something along those lines

override.shape = shapes$shape
override.linetype = shapes$pch
guides(colour = guide_legend(override.aes = list(shape = override.shape, linetype = override.linetype)))...
....

I've also tried making a secondary plot, but not display it, using something suggested in one of the links pasted above:

shapes  = data.frame(shape = c("value", "reference", "minimum"), pch = c(19,0,2), col = c("gray", "gray", "gray"))
p2 = ggplot(shapes, aes(shape, pch)) + geom_point()  
#print(p2)

g_legend <- function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}
legend <- g_legend(p2)
library(gridExtra)
pp <- arrangeGrob(p1 ,legend,
                  widths=c(5/4, 1/4),
                  ncol = 2)

but then I get the error:

> legend <- g_legend(p2)
Error in tmp$grobs[[leg]] : 
  attempt to select less than one element in get1index

for which I did not find a working solution.. so yeah.. any suggestion on how I could only show a legend related to the different dot shapes would be welcome. Thank you

Community
  • 1
  • 1
Marius
  • 990
  • 1
  • 14
  • 34
  • Just add shape and size to the aesthetics you're mapping on, alongside color. Or am I missing something more complicated? – camille Apr 15 '20 at 15:42
  • Thanks for the comment. adding shape `geom_point(aes(x = id, y = value, color = factor(id), shape = 19), size = 6)` (size is not important in this context) results in this error: `Error: A continuous variable can not be mapped to shape`. If it makes a difference, shape is not part of the dataframe, but maybe I should add it there!? – Marius Apr 15 '20 at 15:47
  • You'll want to map it to some variable, like whatever it is that differentiates the first `geom_point` call from the second. But that points to a bigger problem, which is that ggplot is generally intended to be used on long-shaped data where you're not making repeat calls to the same geom just to change some visual element. Maybe take a look at a couple tutorials and see how you could reshape the data – camille Apr 15 '20 at 15:50
  • well, that's why i was trying to create a different plot and not display it, just to generate the legend from a simpler df. because those three sets of points (value, ref, minimum) come from three different columns within the df. so yeah. thought i can just disable the "default" legend and "stitch" a new one :/ – Marius Apr 15 '20 at 15:53
  • The fact that they're in different columns is why you want to reshape the data, so there's a column of values and a column of what they mean (value, ref, or min). Here's one example https://datascience.stackexchange.com/q/66590/50633 – camille Apr 15 '20 at 16:11
  • true true.. I will give it a try. will try to follow the answer from @GGamba – Marius Apr 15 '20 at 16:13

2 Answers2

2

You can manually build a shape legend using scale_shape_manual:

library(ggplot2)

ggplot(data = df) +
  geom_point(aes(x = id, y = value, color = factor(id), shape = 'value'), size = 6) +
  geom_point(aes(x = id, y = ref, color = factor(id), shape = 'ref'), size = 8) +
  geom_point(aes(x = id, y = min, color = factor(id), shape = 'min'), size = 8) +
  scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
  xlab("") +
  ylab("Value")

Created on 2020-04-15 by the reprex package (v0.3.0)

But a better way to do this would be to reshape the df to a long format, and map each aes to a variable:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-id) %>% 
  ggplot() +
  geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
  scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
  scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) + 
  xlab("") +
  ylab("Value")

Created on 2020-04-15 by the reprex package (v0.3.0)

To remove the legend for the color use guide_none():

library(tidyr)
library(ggplot2)
df %>% 
  pivot_longer(-id) %>% 
  ggplot() +
  geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
  scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
  scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) + 
  guides(color = guide_none()) +
  xlab("") +
  ylab("Value")

Created on 2020-04-16 by the reprex package (v0.3.0)

Data:

df = data.frame(id = c("A", "A", "B", "C", "C", "C"), 
                value = c(1,2,1,2,3,4), 
                ref = c(1.5, 1.5, 1, 2,2,2), 
                min = c(0.5, 0.5, 1,2,2,2))
GGamba
  • 13,140
  • 3
  • 38
  • 47
  • thanks for the suggestions. will give it a try and come back to you. I am insisting on **not** having the "default" legend with all the different colors, and keep only the shape one – Marius Apr 15 '20 at 16:13
  • I've tried your first suggestion, but the "shape" legend never got displayed. Maybe there was not enough place in the plotting area. my data has more than ten different `id`s so maybe that's why :/ – Marius Apr 15 '20 at 16:35
  • is there a way to remove the first legend and let only the manually created one? I am trying to reshape the data as suggested but then I get into `Error: No common type for 'name' and 'value' ` when running on my real data and [this](https://stackoverflow.com/questions/58124530/pivot-longer-with-multiple-classes-causes-error-no-common-type) didn't fix it. – Marius Apr 16 '20 at 08:09
  • Edited with instruction to remove the color guide. To solve the reshape problem we would need your real data (I imagine the data you shared are just a toy example) I suggest you open a new question if you can't solve it – GGamba Apr 16 '20 at 08:22
  • yeah, it's a toy example. cannot share the real data due to confidentiality reasons, but will try to fix it and reshape it. if not, as you suggested, i'll open a new question for that alone. thanks for the quick feedback. i consider this answer to be DA one :] thanks again! – Marius Apr 16 '20 at 08:29
  • OK. It works now. For other users: I did get an `could not find function "guide_none"` error running `guides(color = guide_none())`, but replacing `guide_none()` with `"none"` solved it. Running `ggplot2_3.2.1` – Marius Apr 16 '20 at 09:08
  • I'm on 3.3.0, sry about that – GGamba Apr 16 '20 at 09:10
  • not your fault. was just for other users if they run into the same problem. thanks again for all the help! – Marius Apr 16 '20 at 09:18
2

You can tidy your data first using tidyr, and then map the aes shape to the new variable

library(tidyr)
df2 <- pivot_longer(df, -id)

ggplot(data = df2) +
  geom_point(aes(x = id, y = value, shape = name), size = 6) +
  xlab("") +
  ylab("Value")

enter image description here

yang
  • 719
  • 3
  • 11
  • +1 because your approach works, but i chose the other answer for completeness and due to the fact that it actually addresses the **custom** legend part of the question and not only reshaping the data. thank you for your time – Marius Apr 16 '20 at 08:31