0

I would like to add percentage labels to stacked barplot using ggplo2. Here is my code. It does not work so far.

df <- longer_data %>% 
  drop_na(response) %>%
  group_by(question) %>%
  count(response) %>%
  mutate(prop = percent(response / sum(response))) %>%
  mutate(response = factor(response, levels = 1:3, labels = c("Yes", "No", "I don't know"))) %>% 
  mutate(prop = percent(response / sum(response))) %>%
  ggplot(df, aes(x = question, fill = response)) +
  geom_bar(stat= "count", position = "fill") +
  labs(title =" Please indicate which part of the driving task shown on the interface are \n performed by the car or you, the driver of the car.", subtitle =" Speed and distance control" )+scale_fill_manual(values = c("Yes" = "Forestgreen", "No" = "Darkred", "I don't know" = "Grey")) +labs(x ="HMIs", y = "Percentage") +scale_y_continuous(labels = scales::percent) +theme(axis.text.x = element_text(angle = 360, hjust = 0)) 

Here is a snippet of my data:

structure(list(question = c("HMI1", "HMI2", "HMI3", "HMI4", "HMI5",
  "HMI6", "HMI1", "HMI2", "HMI3", "HMI4"), response = c("1", "1",
   "1", "1", "1", "1", "1", "1", "1", "3")), 
   row.names = c(NA, -10L ), class = c("tbl_df", "tbl", "data.frame"))
stefan
  • 90,330
  • 6
  • 25
  • 51
  • To help us to help you, would you mind providing [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data. To share your data, you could type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 10))` for the first ten rows of data. – stefan Dec 08 '21 at 19:40
  • Thanks a lot. Here it is: structure(list(question = c("HMI1", "HMI2", "HMI3", "HMI4", "HMI5", "HMI6", "HMI1", "HMI2", "HMI3", "HMI4"), response = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "3")), row.names = c(NA, -10L ), class = c("tbl_df", "tbl", "data.frame")) – walkytalky Dec 08 '21 at 20:25
  • 1
    Does this help you? – walkytalky Dec 08 '21 at 20:41
  • Yep. That helped. – stefan Dec 08 '21 at 20:44

1 Answers1

0

Basically you were on the right track. Easiest way is probably to aggregate your dataset before passing it to ggplot2.

  1. To get the data wrangling right:
  • count by question and response
  • compute the prop of of each response by question
  1. Plot
  • map prop on y
  • add percentage labels to the bars via geom_text. As we have a stacked bar chart we have to set the position for the labels also to position_stack where I used vjust = .5 to put the labels in the middle of each bar.
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)

longer_data <- structure(list(question = c("HMI1", "HMI2", "HMI3", "HMI4", "HMI5",
                            "HMI6", "HMI1", "HMI2", "HMI3", "HMI4"), response = c("1", "1",
                                                                                  "1", "1", "1", "1", "1", "1", "1", "3")), 
          row.names = c(NA, -10L ), class = c("tbl_df", "tbl", "data.frame"))

df <- longer_data %>%
  drop_na(response) %>%
  count(question, response) %>%
  group_by(question) %>% 
  mutate(prop = n / sum(n),
         response = factor(response, levels = 1:3, labels = c("Yes", "No", "I don't know")))

ggplot(df, aes(x = question, y = prop, fill = response)) +
  geom_col() +
  geom_text(aes(label = percent(prop)), position = position_stack(vjust = .5)) +
  labs(title = " Please indicate which part of the driving task shown on the interface are\nperformed by the car or you, the driver of the car.", 
       subtitle = " Speed and distance control") +
  scale_fill_manual(values = c("Yes" = "Forestgreen", "No" = "Darkred", "I don't know" = "Grey")) +
  labs(x = "HMIs", y = "Percentage") +
  scale_y_continuous(labels = scales::percent) +
  theme(axis.text.x = element_text(angle = 360, hjust = 0))

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks a lot. However, this does not work. I receive this error message. Any idea? Error: You're passing a function as global data. Have you misspelled the `data` argument in `ggplot()` Run `rlang::last_error()` to see where the error occurred. – walkytalky Dec 08 '21 at 23:17
  • When I only run the first part, I receive the following error message: df <- longer_data %>% drop_na(response) %>% count(question, response) %>% group_by(question) %>% mutate(prop = n / sum(n), response = factor(response, levels = 1:3, labels = c("Yes", "No", "I don't know"))) Error in count(., question, response) : object 'question' not found – walkytalky Dec 08 '21 at 23:19
  • Hm. Hard to tell what's the issue. For `longer_data` I used the data you added via `dput()` and which contains a column called `question`. Could you check whether your dataset `longer_data`, e.g. via `str(longer_data)`? – stefan Dec 08 '21 at 23:30
  • Hi, thanks. Yes, strange. Here it is: tibble [6,822 × 2] (S3: tbl_df/tbl/data.frame) $ question: chr [1:6822] "HMI1" "HMI2" "HMI3" "HMI4" ... $ response: int [1:6822] 3 3 1 1 2 1 2 3 1 3 ... – walkytalky Dec 08 '21 at 23:50
  • Weird. Both columns are present in your dataset. What I do in such cases: Restart my R session and run the code again. – stefan Dec 09 '21 at 00:05
  • Hi Stefan, I tried and restarted my session. I receive this error message now: Error: You're passing a function as global data. Have you misspelled the `data` argument in `ggplot()` Any other idea? Thank you very much – walkytalky Dec 09 '21 at 19:33
  • Hm. The issue is probably that there is no dataset `df` in your environment. Instead ggplot uses the `df()` function which results in the error message. However, I have no clue what's the reason for that. As you can see in reprex, I create a dataset `df` based on the dataset `longer_data` for which I used the data your provided as a `dput()`. I have just rerun my code and made an edit to include the data in the reprex. First: Take the reprex as provided and check that you get the same plot. Next: Check with your real data. Your data should go into the line `longer_data <- ... `. – stefan Dec 09 '21 at 19:45
  • Thanks, Stefan! Now I worked using the first code you provided. Strange but very nice to receive this valuable information. Do you happen to re-order percentages, e.g., from lowest to highest or vice versa. Is this simply done with the reorder function? Thank you – walkytalky Dec 10 '21 at 00:09