Soo, I have got a series of plots that I would like to make - depending on a survey done with people. All of them depend on a corresponding column of a data frame, each column filled with a different rang of numbers, from 1 to x, where x depends on the question the plot is related to (some question are answered from 1 to 5, some from 1 to 7 and so on)
I would like to have a fixed legend/key for those questions/plots that have the same answering possibilities e.g.: c("Strongly disagree", "Disagree", "Somewhat disagree", "Neither agree or disagree", "Somewhat agree", "Agree", "Strongly agree")) the first option "Strongly disagree" is a "1" in the data, "Disagree" is a "2" and so on.
To make them easily comparable they should have the same legend/key with the same options and colours.
My Problem is that there are a number of occasions where one or more of the answer options of a question was not chosen by any of the respondents. My current code that looks something like this:
education_plot <- ggplot(Data) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", labels = c("No schooling completed", "Some high school, no diploma", "High school graduate, diploma or
#the equivalent", "College graduate", "Trade/technical/vocational training", "Bachelors degree", "Masters degree", "Doctorate degree")
I have got number of these codeblocks to build every single graph. A graph should display which option was chosen how often (scaled to 100%) in each respondent cluster.
Example: If now no respondent was choosing "No schooling completed" ("1") the legend/key would still use this term and assign a colour but would display the answers "Some high school, no diploma" ("2") in the colour of "No schooling completed", so the legend/key would have the wrong names with the values theoretically connected with it and would not show all of the answer options in the legend/key. (cuts of the last n answer options in the legend/key where n is the number of answer options that nobody chose)
Here the last answer option "Doctorate degree" is cut off but actually nobody chose the first option: "No schooling completed", but these are shown and coloured in the "wrong" data since it should be 0/no bar for this option.
Can someone help me with setting a legend/key that is always fully printed and then showing of the correct values including 0 if not chosen by any respondent???
edit: my test code looks like this:
color_mapping <- setNames(hue_pal() (8), 8)
education_plot <- ggplot(Data) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", values = color_mapping, drop = FALSE, labels = c("No schooling completed", "Some high school, no diploma", "High school graduate, diploma or
the equivalent", "College graduate", "Trade/technical/vocational training", "Bachelors degree", "Masters degree", "Doctorate degree"))
The problem is that the last label still is not represented in the legend ("Doctorate degree") and the data is coloured/connected wrongly since in this example no respondent answered with "No schooling completed". My code simply doesnt know how to match the right value (1-8 in this example) to the right category (label), so it finds 7 different values (2-8) and assigns them to the first 7 labels I definded. How do I tell my code how to match them and shouldnt the legend at least present "Doctorate degree" sind I set drop = FALSE
Dataset produced by dput():
! structure(list(Education = c(7, 4, 7, 7, 8, 6, 6, 8, 8, 6, 4, 5, 6, 7, 6, 8, 4, 4, 8, 7, 7, 3, 5, 7, 4, 4, 7, 7, 7, 5, 7, 3, 7, 8, 6, 8, 5, 7, 5, 6, 4, 6, 3, 6, 7, 7, 6, 4, 2, 7, 3, 6, 4, 4, 6, 6, 4, 4, 8, 7, 4, 4, 8, 6, 5, 7, 7, 7, 7, 4, 6, 4, 8, 8, 7, 8, 8, 6, 7, 4, 6, 6, 6, 5, 6, 7, 7, 4, 7, 6, 7, 7, 7, 4, 6, 7, 6, 3, 7, 7, 7, 6, 6, 4, 6, 4, 6, 4, 8, 7, 4, 5, 4, 6, 4, 7, 6, 6, 4, 7, 6, 6, 8, 7, 8, 5, 7, 7, 8, 7, 6, 6, 6, 4, 8, 7, 8, 6, 6, 4, 7, 6, 6, 6, 3, 7, 7, 4, 8, 8, 7, 8, 7, 4, 6, 4, 8, 6, 7, 7, 3, 7, 5, 8, 6, 3, 7, 7, 8, 4, 8, 6, 7, 7, 6, 6, 3, 6, 6, 8, 6, 6, 2, 4, 7, 6, 8, 8, 6, 3, 4, 8, 7, 6, 5, 7, 7, 8, 7, 3, 6, 4, 4, 4, 7, 4, 8, 7, 7, 6, 6, 6, 6, 6, 3, 4, 7, 6, 6, 6, 6, 6, 4, 6, 7, 7, 3, 6, 7, 6, 6, 6, 4, 7, 6, 6, 6, 7, 7, 4, 6, 3, 6, 6, 6, 6, 7, 6, 6, 4, 4, 6, 6, 4, 4, 4, 6, 4, 6, 6, 6, 6, 6, 6, 4, 6, 4, 4, 6, 6, 6, 8, 6, 6), Cluster = c(4L, 4L, 2L, 2L, 2L, 2L, 4L, 3L, 3L, 2L, 3L, 2L, 4L, 4L, 2L, 4L, 2L, 2L, 4L, 4L, 2L, 3L, 3L, 2L, 3L, 2L, 1L, 4L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L, 1L, 2L, 4L, 2L, 1L, 2L, 4L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 3L, 2L, 2L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 3L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 4L, 1L, 2L, 4L, 4L, 2L, 3L, 2L, 2L, 2L, 4L, 4L, 1L, 2L, 4L, 4L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 4L, 2L, 3L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 4L, 2L, 2L, 4L, 1L, 2L, 2L, 4L, 2L, 4L, 3L, 4L, 2L, 3L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 3L, 2L, 3L, 1L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 2L, 3L, 4L, 3L, 4L, 4L, 2L, 4L, 4L, 2L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 3L, 1L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 4L, 3L, 2L, 2L, 1L)), row.names = c(NA, -274L), class = "data.frame")
Update responding to Update 2 from Dan Adams: my Code:
education_plot <- ggplot(test1) +
aes(Cluster, fill = as.character(Education)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(name = "Level of education", values = color_mapping, drop = F)
his code:
data1 %>%
ggplot(aes(x = Cluster)) +
geom_bar(aes(fill = Education), stat = "count", position = "fill") +
scale_fill_manual(values = color_mapping, drop = F) +
scale_y_continuous(labels = percent)