How can I fix overlapping geom_text values in ggplot without doing data transformation?

Question

Below is the snapshot of my dataset

   Town       Age_Group Race       Count_Type Total_Count
   <chr>      <chr>     <chr>      <chr>            <dbl>
 1 Milwaukee  12-17     White      Initial            500
 2 Milwaukee  12-17     White      Full               424
 3 Milwaukee  12-17     Black      Initial           1080
 4 Milwaukee  12-17     Black      Full               771
 5 Milwaukee  12-17     AmerIndian Initial             11
 6 Milwaukee  12-17     AmerIndian Full                 5

Code for the plot, I should also mention that ggplot2 is a hard requirement

# Visualization
ggplot(data = milwaukee, aes(x = Age_Group, y = Total_Count, fill = Race)) +
  geom_bar(stat = 'identity', position = 'stack') +
  labs(x = 'Age Group', y = 'Total Vaccinated by Age Group',
       title = 'Milwaukee Total Vaccinated by Age Group & Race') + 
  # scale_y_continuous(trans = 'log2') +
  geom_text(aes(label = scales::number(Total_Count, big.mark = ',', accuracy = 1)), size = 2, 
            position = position_stack(vjust = 0.5)) + 
  theme_classic() + 
  theme(text = element_text(size = 9, family = 'mono'), 
        legend.position = 'bottom',
        plot.title = element_text(hjust = 0.5, size = 11))

Sample data

> dput(milwaukee)
structure(list(Town = c("Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", "Milwaukee", 
"Milwaukee", "Milwaukee"), Age_Group = c("12-17", "12-17", "12-17", 
"12-17", "12-17", "12-17", "12-17", "12-17", "18-24", "18-24", 
"18-24", "18-24", "18-24", "18-24", "18-24", "18-24", "25-44", 
"25-44", "25-44", "25-44", "25-44", "25-44", "25-44", "25-44", 
"45-64", "45-64", "45-64", "45-64", "45-64", "45-64", "45-64", 
"45-64", "65+", "65+", "65+", "65+", "65+", "65+", "65+", "65+"
), Race = c("White", "Black", "AmerIndian", "Asian", "Hispanic", 
"MultipleRaces", "Other", "Unknown", "White", "Black", "AmerIndian", 
"Asian", "Hispanic", "MultipleRaces", "Other", "Unknown", "White", 
"Black", "AmerIndian", "Asian", "Hispanic", "MultipleRaces", 
"Other", "Unknown", "White", "Black", "AmerIndian", "Asian", 
"Hispanic", "MultipleRaces", "Other", "Unknown", "White", "Black", 
"AmerIndian", "Asian", "Hispanic", "MultipleRaces", "Other", 
"Unknown"), Count_Type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("Initial", "Full"), class = "factor"), Total_Count = c(500, 
1080, 11, 172, 2404, 105, 135, 272, 1012, 1610, 10, 326, 3051, 
110, 502, 480, 3281, 4185, 34, 738, 10023, 147, 2060, 1907, 4453, 
6361, 41, 695, 9250, 144, 2549, 2043, 4000, 3520, 22, 368, 3554, 
83, 1182, 1354)), row.names = c(NA, -40L), class = c("tbl_df", 
"tbl", "data.frame"))

And below is my messy plot. What can I add or change in order to have values not overlap? Different chart ideas are also welcome

Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide your sample data in a ready-to-copy format, e.g. with `dput()`. — mnist, Aug 23 '21 at 21:29
In `geom_text()`, you can set `check_overlap = TRUE` to censor overlapping values. — teunbrand, Aug 23 '21 at 21:45
@mnist thanks for suggestion. I just edited the question to provide sample data — user3813620, Aug 23 '21 at 22:22
@teunbrand thanks for the tip, but I would like to be able to show all the labels if at all possible. I just provided sample data above — user3813620, Aug 23 '21 at 22:23
As a matter of aesthetics, you might also consider sorting the Race values in order of frequency instead of alphabetical. Or, depending on what the takeaway message is, you might want to normalize these by population, since it's hard to know if a given number is high or low without knowing how many people there are in each age/race category. — Jon Spring, Aug 23 '21 at 22:40

Jon Spring · Answer 1 · 2021-08-23T22:35:16.117

You might try ggrepel, but it could take some fiddling to get what you want, given the 2 orders of magnitude of data range. I used the direction = "y" parameter to specify the labels should only be shifted up and down (to be tidier), but you might prefer giving the labels the ability to move side-to-side (direction = "x") or in any direction (omit the direction parameter or set to "both").

...
  ggrepel::geom_text_repel(aes(label = scales::number(Total_Count, big.mark = ',', accuracy = 1)), size = 2, 
            position = position_stack(vjust = 0.5), direction = "y", 
            box.padding = unit(0.01, "lines")) + 
...

...or, same with direction = "x", segment.color = NA,:

direction = 'y' worked, but I had no luck 'x'. Not sure what I did wrong. I'm probably going to stick with that or edit the visuals manually in an SVG editor. Thank you — user3813620, Aug 24 '21 at 03:02

score 1 · Answer 2 · answered Aug 24 '21 at 06:22

Given the data, there is probably no ideal solution to this problem. Too many groups are just too small to be shown in the same bar with labels within/on top of each other.

In general, {ggfittext} does exactly what you are looking for, yet it can not perform miracles:

ggplot(data = milwaukee, aes(x = Age_Group, y = Total_Count, fill = Race)) +
  geom_bar(stat = 'identity', position = 'stack') +
  labs(x = 'Age Group', y = 'Total Vaccinated by Age Group',
       title = 'Milwaukee Total Vaccinated by Age Group & Race') + 
  ggfittext::geom_bar_text(position = "stack", reflow = TRUE, outside = TRUE) +
  theme_classic() + 
  theme(text = element_text(size = 9, family = 'mono'), 
        legend.position = 'bottom',
        plot.title = element_text(hjust = 0.5, size = 11))

I'd suggest to either combine some groups, use a relative presentation, or adjust the missing labels outside of ggplot.

I found a potential solution [here](https://stackoverflow.com/questions/24626769/alternate-geom-text-position-with-hjust). However, I'm having difficulties adjusting it to my dataset. Specifically I don't understand what got fed into `y` inside the second call to `aes()` function. Any help would be highly appreciated — user3813620, Aug 29 '21 at 18:08

How can I fix overlapping geom_text values in ggplot without doing data transformation?

2 Answers2