Change count to percentage on faceted, filled geom_bar()/stat_count() plot in ggplot2 R

Question

I have this dataset from a survey:

                         Var1                 by variable value
1           Strongly disagree  Cluster 1 (n = 9)        A     0
2           Strongly disagree Cluster 2 (n = 15)        A     0
3           Somewhat disagree  Cluster 1 (n = 9)        A     0
4           Somewhat disagree Cluster 2 (n = 15)        A     0
5  Neither agree nor disagree  Cluster 1 (n = 9)        A     2
6  Neither agree nor disagree Cluster 2 (n = 15)        A     0
7              Somewhat agree  Cluster 1 (n = 9)        A     1
8              Somewhat agree Cluster 2 (n = 15)        A     0
9              Strongly agree  Cluster 1 (n = 9)        A     6
10             Strongly agree Cluster 2 (n = 15)        A    15
11          Strongly disagree  Cluster 1 (n = 9)        B     1
12          Strongly disagree Cluster 2 (n = 15)        B     0
13          Somewhat disagree  Cluster 1 (n = 9)        B     0
14          Somewhat disagree Cluster 2 (n = 15)        B     0
15 Neither agree nor disagree  Cluster 1 (n = 9)        B     1
16 Neither agree nor disagree Cluster 2 (n = 15)        B     0
17             Somewhat agree  Cluster 1 (n = 9)        B     4
18             Somewhat agree Cluster 2 (n = 15)        B     1
19             Strongly agree  Cluster 1 (n = 9)        B     3
20             Strongly agree Cluster 2 (n = 15)        B    14
21          Strongly disagree  Cluster 1 (n = 9)        C     0
22          Strongly disagree Cluster 2 (n = 15)        C     0
23          Somewhat disagree  Cluster 1 (n = 9)        C     0
24          Somewhat disagree Cluster 2 (n = 15)        C     0
25 Neither agree nor disagree  Cluster 1 (n = 9)        C     3
26 Neither agree nor disagree Cluster 2 (n = 15)        C     0
27             Somewhat agree  Cluster 1 (n = 9)        C     1
28             Somewhat agree Cluster 2 (n = 15)        C     3
29             Strongly agree  Cluster 1 (n = 9)        C     5
30             Strongly agree Cluster 2 (n = 15)        C    12

I originally plotted it like so using ggplot2 to display the count of responses:

( p5 <- ggplot(q5, aes(x = Var1, y = value, fill = variable)) +
    geom_bar(stat = "identity", width = 0.5, position=position_dodge2(reverse = TRUE)) +
    coord_flip() +
    theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
    axis.title.x = element_text(size = 16),      
    axis.title.y = element_text(size = 16),
    axis.text.y = element_text(size = 16),
    legend.text=element_text(size=16),
    legend.title=element_text(size=16),
    strip.text.x = element_text(size = 16)) +
    ylim(0,20) +
    scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
    labs(x = "", y = "# of Responses", fill = "Question") +
    facet_grid(. ~ by) )

which gave me this:

However, I want to display the data as a percentage rather than count.

Following this post, I changed the code accordingly to:

( p5 <- ggplot(q5, aes(x = Var1, group = by, fill = variable)) +
    stat_count(mapping = aes(y = ..prop..)) +
    coord_flip() +
    theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
    axis.title.x = element_text(size = 16),      
    axis.title.y = element_text(size = 16),
    axis.text.y = element_text(size = 16),
    legend.text=element_text(size=16),
    legend.title=element_text(size=16),
    strip.text.x = element_text(size = 16)) +
    scale_y_continuous(limits = c(0,1),labels = scales::percent_format(accuracy = 5L)) +
    scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
    labs(x = "", y = "% of Responses", fill = "Question") +
    facet_grid(. ~ by) )

However, this gives me this plot:

It seems like the plot is not recognizing my fill argument or the ..prop.. argument for y.

How can I fix this?

StupidWolf · Accepted Answer · 2020-02-14T23:41:34.963

0

I have problems copying-pasting the data so I make an example like your data:

set.seed(111)
df = expand.grid(Var1=c("strong disagree","disagree","strong agree","agree","neither"),
by=1:2,variable=LETTERS[1:3])
df$value=rnbinom(nrow(df),mu=5,size=0.5)
df$value[df$Var1=="disagree" & df$by==1]=0

The error you have above is trying to do stat_count with on its own group. The easier solution i think is to count the proportions first and just plot:

library(ggplot2)
library(tidyr)
library(dplyr)

df %>% group_by(by,variable) %>% 
mutate(value=replace_na(value/sum(value),0)) %>% 
ggplot(aes(x=Var1,y=value,fill=variable)) + 
geom_col(position="dodge") + facet_wrap(~by) + 
scale_y_continuous(labels = scales::percent_format()) + 
coord_flip()

edited Feb 14 '20 at 23:41

answered Feb 14 '20 at 22:40

StupidWolf

45,075
17
40
72

You solved the graphing issue but this doesn't work for me. It's incorrectly calculating the percentages and calculating NaN where there are zeros with your dplyr code. ``` 5 Neither agree nor disagree Cluster 1 (n = 9) A 2 0.333 6 Neither agree nor disagree Cluster 2 (n = 15) A 0 NaN ``` – Fully Aquatic Feb 14 '20 at 23:10
I see, there are clusters where it's all zeros. You can replace the NAs with zeros. which percentage are you calculating? It's not very clear from your question. I.e percentage within 1 or 2, or percentage within strong agree etc, nested within 1 or 2? – StupidWolf Feb 14 '20 at 23:23
percentage of responses to each category within each cluster (i.e. if you look at the first figure with just the number of responses, there are 6/9 strongly agrees for cluster 1 which i would like to display as 66% of responses. – Fully Aquatic Feb 14 '20 at 23:33
Then it's group by cluster and variable.. hey man this is really not clear from your question – StupidWolf Feb 14 '20 at 23:42

Change count to percentage on faceted, filled geom_bar()/stat_count() plot in ggplot2 R

1 Answers1