I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
- Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
- I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
- I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!