0

I had previously asked about using geom_bar to plot the proportions of a binary variable.

This solution worked well. Now I need to add data labels to the plot. However, geom_text requires a y and label, and I am not sure how to do that using the following syntax

df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels=scales::percent) +
  labs(y = "Proportion")

Is it possible to add data labels to this kind of stacked bar chart?

Secondary question: given that it is a proportion, the top and bottom labels provide the same information. Is it possible to only label the "lower" bar?

Steven
  • 73
  • 5
  • I linked to the FAQ on the topic: if you still have trouble I'd suggest asking a new question that includes sample data to illustrate the problem and shows your attempt(s) based on the FAQ--understanding the problem and what's been tried will make it easy to help. – Gregor Thomas Aug 22 '22 at 20:47

1 Answers1

1

Although ggplot is good at performing common summary operations, people sometimes tie themselves in knots trying to get ggplot to do data wrangling that is actually straightforward to do en route to ggplot. Simply create the proportions and labels as columns in the data you are passing.

library(tidyverse)

df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  group_by(Year, Remission) %>%
  count() %>%
  group_by(Year) %>%
  mutate(Proportion = n/sum(n), 
         label = ifelse(Remission == 1, scales::percent(Proportion), "")) %>%
  ggplot(aes(x = Year, y = Proportion, fill = Remission)) +
  geom_col() +
  geom_text(position = position_fill(vjust = 0.5), aes(label = label),
            size = 7) +
  scale_y_continuous(labels=scales::percent) +
  labs(y = "Proportion") +
  scale_fill_brewer(palette = "Pastel1") +
  theme_minimal(base_size = 20)

enter image description here


Data from previous question in reproducible format

df <- structure(list(Client_id = c(2L, 4L, 7L, 8L, 12L), Year = c(2016L, 
2017L, 2017L, 2016L, 2016L), Remission = c(0L, 1L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(NA, -5L))
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87