1

I'm using ggplot and trying to create a plot that has letters as the labels, but full descriptions in the legend. I'm very close to getting it with this code, but as of now the legend displays the correct color information while the plot shows seemingly random colors.

Dput:

structure(list(Category = c("BNPL", "Digital profile", "Voice", 
"Price matching", "Marketing opt-In", "Promo codes", "Two-factor authentication", 
"Using mobile device to locate a product in a physical store", 
"Profile (shopping journey among different channels)", "Inventory"
), Consumers = c(0.401189529, 0.512550424, 0.428663368, 0.468411444, 
0.496927651, 0.560380251, 0.516336569, 0.520916203, 0.433138295, 
0.551730895), Merchants = c(0.548387097, 0.654121864, 0.543010753, 
0.562724014, 0.571684588, 0.632616487, 0.586021505, 0.575268817, 
0.464157706, 0.575268817), Dif = c(-0.147197568, -0.141571439, 
-0.114347385, -0.09431257, -0.074756937, -0.072236237, -0.069684937, 
-0.054352614, -0.031019411, -0.023537922), Country = structure(c(7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("All", "AUS", 
"BRA", "MEX", "UAE", "UK", "US"), class = "factor"), Year = c(2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 2021L
), SumInterest = c(94.9576626, 116.6672288, 97.1674121, 103.1135458, 
106.8612239, 119.2996738, 110.2358074, 109.618502, 89.7296001, 
112.6999712), absdiff = c(14.7197568, 14.1571439, 11.4347385, 
9.431257, 7.4756937, 7.2236237, 6.9684937, 5.4352614, 3.1019411, 
2.3537922), percdiff = c(-0.147197568, -0.141571439, -0.114347385, 
-0.09431257, -0.074756937, -0.072236237, -0.069684937, -0.054352614, 
-0.031019411, -0.023537922), Letter = c("A", "B", "C", "D", "E", 
"F", "G", "H", "I", "J"), color = c("P", "L", "P", "P", "P", 
"L", "L", "L", "P", "L"), hex = c("#ff0054", "#9e0059", "#ff0054", 
"#ff0054", "#ff0054", "#9e0059", "#9e0059", "#9e0059", "#ff0054", 
"#9e0059")), row.names = c(NA, 10L), class = "data.frame")

plot1 <- subset %>%
  ggplot(aes(x = Consumers,
             y = percdiff,
             fill = Category,
             label = Letter)) +
  geom_hline(yintercept=0 ,color = "lightgrey", size=1.5)+
  geom_vline(xintercept=.5, color = "lightgrey", size=1.5)+
  geom_point() +
  geom_label_repel(color="White", segment.color="black") +
  scale_fill_manual(labels = subset$Category, values = subset$hex) +
  guides(
    fill = guide_legend(
      title = '',
      ncol = 1,
      override.aes = list(label = subset$Letter, color = 'white', family = "Lato")
    )
  ) +
  labs(title = "Consumer interest Vs. Awareness Gap",
       subtitle = "In the US 2021",
       x = "Consumer Interest",
       y = "Awareness Gap") +
  theme(legend.position = 'right') +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 0.1))

[enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
ZArmstrong
  • 67
  • 5
  • It would be eaiser to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. If you want to post your data type `dput(NAME_OF_DATASET)` into the console and copy the output into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 10))`. – stefan Feb 04 '22 at 18:41
  • 1
    ... this said the issue is most likely related to the fact that you add the labels and colors via `subset$Category` and `subset$hex`. If you want to make sure that labels and colors are assigned to the right categories then the best approach would be to make use of named vectors. – stefan Feb 04 '22 at 18:48
  • 1
    @stefan Thanks for taking a look I added the dput output – ZArmstrong Feb 04 '22 at 19:13

1 Answers1

1

To achieve your desired result you have to set the right order:

  1. Just in case: Order your data by your column letters
  2. Order your Category column by the letters where I make use of forecast::fct_inorder
  3. Make use of a named vector of colors which assigns hex values to each of your categories.
  4. After doing so you could you the color vector in scale_fill_manual and get rid of the labels argument which is not necessary.

Note: I dropped the Lato font for the reprex.

library(ggrepel)
#> Loading required package: ggplot2
library(ggplot2)

# Make named vector of colors
colors <- setNames(subset$hex, subset$Category)
# Order by letters
subset <- subset[order(subset$Letter), ]
# Set order of factor
subset$Category <- forcats::fct_inorder(subset$Category)

ggplot(subset, aes(
    x = Consumers,
    y = percdiff,
    fill = Category,
    label = Letter
  )) +
  geom_hline(yintercept = 0, color = "lightgrey", size = 1.5) +
  geom_vline(xintercept = .5, color = "lightgrey", size = 1.5) +
  geom_point() +
  geom_label_repel(color = "White", segment.color = "black") +
  scale_fill_manual(values = colors) +
  guides(
    fill = guide_legend(
      title = "",
      ncol = 1,
      override.aes = list(label = subset$Letter, color = "white")
    )
  ) +
  labs(
    title = "Consumer interest Vs. Awareness Gap",
    subtitle = "In the US 2021",
    x = "Consumer Interest",
    y = "Awareness Gap"
  ) +
  theme(legend.position = "right") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 0.1))

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Wow! Thank you so much! I'm still a bit unclear why this fixes it? Was the old code assigning colors by the order of the categories column and so by reordering that it can still assign colors by category, but this time they're correct? – ZArmstrong Feb 04 '22 at 19:45
  • The issue was your `Category` column. If no order is set ggplot will order alphabetically which differs from the order in your dataset. Hence, because of the different order when using `subset$...` you assigned wrong colors, labels and letters to your categories. That's the reason why I would recommend to avoid using dataframe columns but instead add labels and colors via named vectors. – stefan Feb 04 '22 at 19:53