0

My goal is to create a barplot that visualises the percentages of three variables; however, my current graph does so in a rather confusing way. A little bit of context: Each of my variables can can have one of two possible values:

  • Reference: null or overt
  • Variety: SING or GB
  • Register: S1A or S1B

Overall, the data frame looks like this (with a few more thousand lines):

     Reference Register Variety
1      null      S1A    SING
2      null      S1A    SING
3      null      S1A    SING
4      null      S1A    SING
5      null      S1A    SING
6      null      S1A    SING

I have used the following code to create the barplot below:

data_raw <- read.csv("INPUT.csv", TRUE, ",")

data_2 <- data_raw %>%
  count(Reference, Variety, Register) %>%
  mutate(pct = n / sum(n),
         pct_label = scales::percent(pct))

ggplot(data_2, aes(x= Reference, fill = Variety, y = pct)) +
  geom_col() +
  geom_text(aes(label = paste(pct_label, n, sep = "\n")), 
            lineheight = 0.8,
            position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent)

Preliminary barplot

The third variable, Register, is represented by two separate values within a single-coloured box, e.g., 684/20.22% (S1B) and 931/27.52% (S1A) for the variety GB. While I can infer from my data which of these two values stands for S1A or S1B, I need this to be apparent from the barplot as well. For example, would it be possible to add a label to "684/20.22%" that indicates that it is the S1B value?

Another obvious problem is that the data for the x-value "null" contains very low percentages, making it hard to read. I'm not sure what would be the best way to handle this. Perhaps it would make sense to do away with the numbers altogether and rely on colours only.

I'd be very grateful for any suggestions or solutions to my problem. I'm still a beginner and hope to become better at using R for data analysis.

1 Answers1

1

If you just want to add Register into the label, I think just add it to the label should work:

...
geom_text(aes(label = paste(Register, pct_label, n, sep = "\n")), 
...

However I think you may want to look for some more aesthetic ideas, such as adding stripes or making it semi-transparent for the Register variable.

To jitter crowded label, you can look at this post.

Yuan Yin
  • 126
  • 6