1

I have some data

library(data.table)
wide <- data.table(id=c("A","C","B"), var1=c(1,6,1), var2=c(2,6,5), size1=c(11,12,13), size2=c(10,12,10), flag=c(FALSE,TRUE,FALSE))
> wide
   id var1 var2 size1 size2  flag
1:  A    1    2    11    10 FALSE
2:  C    6    6    12    12  TRUE
3:  B    1    5    13    10 FALSE

which I would like to plot as bubble plots where id is ordered by var2, and bubbles are as follows: ID A and B: var1 is plotted in size1 and "empty bubbles" and var2 is plotted in size2 with "filled" bubbles. ID C is flagged because there is only one value (this is why var1=var2) and it should have a "filled bubble" of a different color.

I have tried this as follows:

cols <- c("v1"="blue", "v2"="red", "flags"="green")
shapes <- c("v1"=16, "v2"=21, "flags"=16)
p1 <- ggplot(data = wide, aes(x = reorder(id,var2))) + scale_size_continuous(range=c(5,15))
p1 <- p1 + geom_point(aes(size=size1, y = var1, color = "v1", shape = "v1")) 
p1 <- p1 + geom_point(aes(size=size2, y = var2, color = "v2", shape = "v2", stroke=1.5))
p1 <- p1 + geom_point(data=subset(wide,flag), aes(size=size2[flag], y=var2[flag], color= "flags", shape="flags"))
p1 <- p1 + scale_color_manual(name = "test", 
                                values = cols,
                                labels = c("v1", "v2", "flags"))
p1 <- p1 + scale_shape_manual(name = "test", 
                              values = shapes,
                              labels = c("v1", "v2", "flags"))

which gives (in my theme)

output

but two questions remain:

  1. What happened to the order in the legend? I have followed the recipe of the bottom solution in Two geom_points add a legend but somehow the order does not match.
  2. How to get rid of the stroke around the green bubble and why is it there?

Overall, something appears to go wrong in matching shape and color.

bumblebee
  • 1,116
  • 8
  • 20

1 Answers1

3

I admit, it took me a while to understand your slightly convoluted plot. Forgive me, but I have allowed myself to change the way to plot, and make (better?) use of ggplot.

The data shape is less than ideal. ggplot works extremely well with long data. It was a bit of a guesswork to reshape your data, and I decided to go the quick and dirty way to simply bind the rows from selected columns.

Now you can see, that you can achieve the new plot with a single call to geom_point. The rest is "scale_aesthetic" magic...

In order to combine the shape and color legend, safest is to use override.aes. But beware! It does not take named vectors, so the order of the values needs to be in the exact order given by your legend keys - which is usually alphabetic, if you don't have the factor levels defined.

update re: request to order x labels

This hugely depends on the actual data structure. if it is originally as you have presented, I'd first make id a factor with the levels ordered based on your var2. Then, do the data shaping.

library(tidyverse)
# data reshape
wide <- data.frame(id=c("C","B","A"), var1=c(1,6,1), var2=c(2,6,5), size1=c(11,12,13), size2=c(10,12,10), flag=c(FALSE,TRUE,FALSE))
wide <- wide %>% mutate(id = reorder(id, var2))

wide1 <- wide %>% filter(!flag) %>%select(id, var = var1, size = size1)
wide2 <- wide %>%  filter(!flag) %>% select(id, var = var2, size = size2)
wide3 <- wide %>% filter(flag) %>% select(id, var = flag, size = size2) %>%
  mutate(var = 6)
long <- bind_rows(list(v1 = wide1, v2 = wide2, flag = wide3), .id = "var_id") 

# rearrange the vectors for scales aesthetic
cols <- c(flag="green", v1 ="blue", v2="red" )
shapes <- c(flag=16, v1=16, v2 =21 )

ggplot(data = long, aes(x = id, y = var)) + 
  geom_point(aes(size=size, shape = var_id, color = var_id), stroke=1.5) +
  scale_size_continuous(limits = c(5,15),breaks = seq(5,15,5)) +
  scale_shape_manual(name = "test", values = shapes) +
  scale_color_manual(values = cols, guide = FALSE) +
  guides(shape = guide_legend(override.aes = list(color  = cols)))

P.S. the reason for the red stroke around the green bubble in your plot is that you also plotted the 'var2' behind your flag.

Created on 2020-04-08 by the reprex package (v0.3.0)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Apologies for the convoluted plot and thanks for digging through it! I had named the data `wide` because the post I linked had already mentioned that `long` data is preferred. However, I wondered then (and still) how to set the order of the categories. Here, they seem to ordered automatically as A, B, C, but how would I order them by their value on `var` within a particular `var_id`? And how do I get those on `var_id==flag` to be be part of that ordered sequence? – bumblebee Apr 08 '20 at 13:49
  • Also, when applying the method to my real problem, it works unless line `scale_size_continuous(...)` is added. Then, it returns an warning: `Removed 62 rows containing missing values (geom_point)` and an empty plot. Any idea what this could be? Does this have to do with the re-arrangement of scale aesthetic? – bumblebee Apr 08 '20 at 15:47
  • @bumblebee to the last point - this is most likely because you have sizes outside the scale limit. Remove the limit argument from the scale_size call, or change it – tjebo Apr 08 '20 at 16:02
  • @bumblebee to the first point - could you specify based on which value you would like to order the IDs? v1 or v2? – tjebo Apr 08 '20 at 16:03
  • This is likely correct, I might have confuse limits with something else but can only check later. In my post, I ordered by var2 to get the A-B-C order. – bumblebee Apr 08 '20 at 16:07
  • From what i see you suggest ordering them before transforming into long. I have already tried the same but ggplot orders them alphabetically nonetheless. – bumblebee Apr 08 '20 at 16:52
  • @bumblebee I have used the reprex package, so it definitely works. If something doesn't work quite as well as you think - always worth to restart R, remove all predefined objects in the environment, etc. (i.e., start a fresh session). – tjebo Apr 08 '20 at 16:55