1

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).

I would like to use ggplot2 to

  1. Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
  2. I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
  3. I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"

My sample code is as follow:

ID <- c("A","A","A","A","A","A",
        "B","B","B",
        "C","C","C","C","C","C","C",
        "D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)

library(ggplot2)
library(dplyr)
library(reshape2)

df.plot <- df %>% 
  group_by(ID) %>% 
  summarize(Infected = sum(Infection)/n(),
            Non_Infected = 1-Infected)

df.plot %>% 
  melt() %>% 
  ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") + 
  xlab("ID") + 
  ylab("Percent Infection") +
  scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
  coord_flip()

Right now I managed to get this output:

enter image description here

I hope to get this:

enter image description here

Thank you so much!

Huicong
  • 65
  • 1
  • 7

1 Answers1

1

First, we need to add a count to your original data.frame.

df.plot <- df %>% 
  group_by(ID) %>% 
  summarize(Infected = sum(Infection)/n(),
            Non_Infected = 1-Infected,
            count = n())

Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.

df.plot %>% 
  mutate(ID = paste0(ID, " (n=", count, ")")) %>%
  select(-count) %>%
  melt() %>% 
  mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
  ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) + 
  geom_bar(stat = "identity", position = "stack") + 
  xlab("ID") + 
  ylab("Percent Infection") +
  scale_fill_manual("Infection Status", 
                    values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
                    labels = c("Non-Infected", "Infected"))+
  coord_flip()

enter image description here

bouncyball
  • 10,631
  • 19
  • 31
  • Hi thanks this is great! Is there a way to keep the original colour theme and also for the legend can I change to "Non-Infected" instead of "Non_Infected"? – Huicong Mar 12 '20 at 17:21
  • We can define the colors (see: https://stackoverflow.com/questions/8197559/emulate-ggplot2-default-color-palette) and use the `labels` argument in `scale_fill_manual` – bouncyball Mar 12 '20 at 17:26