I'm looking for a solution for the following problem:
I have data that contains two factor variables EDU
and LEVEL
. The reproducible data sample is here:
structure(list(EDU = structure(c(3L, 1L, 2L, 2L, 3L, 2L, 3L,
2L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L, 2L,
2L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L,
3L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 1L), .Label = c("A",
"B", "C"), class = "factor"), LEVEL = structure(c(3L, 3L, 4L,
2L, 4L, 3L, 1L, 2L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 2L, 2L,
4L, 1L, 2L, 3L, 3L, 1L, 4L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 2L, 1L,
4L, 3L, 1L, 4L, 3L, 4L, 1L, 4L, 2L, 4L, 1L, 1L, 4L, 3L, 1L), .Label = c("1",
"2", "3", "4"), class = "factor")), class = "data.frame", row.names = c(NA,
-50L))
Using this data I want to plot a barplot with ggplot2
showing the grouping variable EDU
on the x-axis and the cumulative percentages of LEVEL
on the y-axis. Additionally I want to add a fourth bar that contains the percentages of LEVEL
but not grouped by EDU
-- somewhat like an "overall bar". Furthermore I want to add percentage labels within the plot, so that every LEVEL
is labelled with the corresponding relative frequencies like in this thread or this thread. To be honest, I tried to adapt my code with different solutions from stackoverflow to get the percentage labels into the plot as there are a lot of threads on this topic (especially when it comes to percentage labels) but stucked. So far, my ggplot2
code looks like this:
library(tidyverse)
ggplot(df, aes(x=EDU, fill=LEVEL)) +
geom_bar(position="fill") +
scale_y_continuous(labels = scales::percent)
And results in the following plot:
That plot looks good so far. But as above-mentioned my aim is to add percentage labels, probably with geom_text
AND a fourth "overall bar" besides the three existing ones. For the percentage labels I also tried to make a prop.table
and added the percentage labels with the corresponding props and annotate
:
props <- prop.table(table(df$EDU, df$LEVEL), margin=1)
ggplot(df, aes(x=EDU, fill=LEVEL)) +
geom_bar(position="fill") +
scale_y_continuous(labels = scales::percent) +
annotate("text", x="A", y=.15, label=scales::percent(props[1,4])) +
annotate("text", x="B", y=.10, label=scales::percent(props[2,4])) +
annotate("text", x="C", y=.275, label=scales::percent(props[3,4])) +
annotate("text", x="A", y=.375, label=scales::percent(props[1,3])) +
annotate("text", x="B", y=.275, label=scales::percent(props[2,3])) +
annotate("text", x="C", y=.625, label=scales::percent(props[3,3])) +
annotate("text", x="A", y=.66, label=scales::percent(props[1,2])) +
annotate("text", x="B", y=.5, label=scales::percent(props[2,2])) +
annotate("text", x="C", y=.78, label=scales::percent(props[3,2])) +
annotate("text", x="A", y=.9, label=scales::percent(props[1,1])) +
annotate("text", x="B", y=.9, label=scales::percent(props[2,1])) +
annotate("text", x="C", y=.9, label=scales::percent(props[3,1]))
That results in the following plot:
This seems cumbersome to me, especially when I want to create more than one plot and have to annotate each percentage separately. Here, the question might be how I can set the y-arguments in annotate
in an "automised" way to let R position the labels for me.
Regarding the "overall bar" problem I have no idea how to solve this, unfortunately.
I'm grateful for any help!