0

I have this dataset from a survey:

                         Var1                 by variable value
1           Strongly disagree  Cluster 1 (n = 9)        A     0
2           Strongly disagree Cluster 2 (n = 15)        A     0
3           Somewhat disagree  Cluster 1 (n = 9)        A     0
4           Somewhat disagree Cluster 2 (n = 15)        A     0
5  Neither agree nor disagree  Cluster 1 (n = 9)        A     2
6  Neither agree nor disagree Cluster 2 (n = 15)        A     0
7              Somewhat agree  Cluster 1 (n = 9)        A     1
8              Somewhat agree Cluster 2 (n = 15)        A     0
9              Strongly agree  Cluster 1 (n = 9)        A     6
10             Strongly agree Cluster 2 (n = 15)        A    15
11          Strongly disagree  Cluster 1 (n = 9)        B     1
12          Strongly disagree Cluster 2 (n = 15)        B     0
13          Somewhat disagree  Cluster 1 (n = 9)        B     0
14          Somewhat disagree Cluster 2 (n = 15)        B     0
15 Neither agree nor disagree  Cluster 1 (n = 9)        B     1
16 Neither agree nor disagree Cluster 2 (n = 15)        B     0
17             Somewhat agree  Cluster 1 (n = 9)        B     4
18             Somewhat agree Cluster 2 (n = 15)        B     1
19             Strongly agree  Cluster 1 (n = 9)        B     3
20             Strongly agree Cluster 2 (n = 15)        B    14
21          Strongly disagree  Cluster 1 (n = 9)        C     0
22          Strongly disagree Cluster 2 (n = 15)        C     0
23          Somewhat disagree  Cluster 1 (n = 9)        C     0
24          Somewhat disagree Cluster 2 (n = 15)        C     0
25 Neither agree nor disagree  Cluster 1 (n = 9)        C     3
26 Neither agree nor disagree Cluster 2 (n = 15)        C     0
27             Somewhat agree  Cluster 1 (n = 9)        C     1
28             Somewhat agree Cluster 2 (n = 15)        C     3
29             Strongly agree  Cluster 1 (n = 9)        C     5
30             Strongly agree Cluster 2 (n = 15)        C    12

I originally plotted it like so using ggplot2 to display the count of responses:

( p5 <- ggplot(q5, aes(x = Var1, y = value, fill = variable)) +
    geom_bar(stat = "identity", width = 0.5, position=position_dodge2(reverse = TRUE)) +
    coord_flip() +
    theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
    axis.title.x = element_text(size = 16),      
    axis.title.y = element_text(size = 16),
    axis.text.y = element_text(size = 16),
    legend.text=element_text(size=16),
    legend.title=element_text(size=16),
    strip.text.x = element_text(size = 16)) +
    ylim(0,20) +
    scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
    labs(x = "", y = "# of Responses", fill = "Question") +
    facet_grid(. ~ by) )

which gave me this:

enter image description here

However, I want to display the data as a percentage rather than count.

Following this post, I changed the code accordingly to:

( p5 <- ggplot(q5, aes(x = Var1, group = by, fill = variable)) +
    stat_count(mapping = aes(y = ..prop..)) +
    coord_flip() +
    theme(plot.title = element_text(size = 16), axis.text.x = element_text(size = 16),
    axis.title.x = element_text(size = 16),      
    axis.title.y = element_text(size = 16),
    axis.text.y = element_text(size = 16),
    legend.text=element_text(size=16),
    legend.title=element_text(size=16),
    strip.text.x = element_text(size = 16)) +
    scale_y_continuous(limits = c(0,1),labels = scales::percent_format(accuracy = 5L)) +
    scale_x_discrete(limits=c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree")) +
    labs(x = "", y = "% of Responses", fill = "Question") +
    facet_grid(. ~ by) )

However, this gives me this plot:

enter image description here

It seems like the plot is not recognizing my fill argument or the ..prop.. argument for y.

How can I fix this?

markus
  • 25,843
  • 5
  • 39
  • 58

1 Answers1

0

I have problems copying-pasting the data so I make an example like your data:

set.seed(111)
df = expand.grid(Var1=c("strong disagree","disagree","strong agree","agree","neither"),
by=1:2,variable=LETTERS[1:3])
df$value=rnbinom(nrow(df),mu=5,size=0.5)
df$value[df$Var1=="disagree" & df$by==1]=0

The error you have above is trying to do stat_count with on its own group. The easier solution i think is to count the proportions first and just plot:

library(ggplot2)
library(tidyr)
library(dplyr)

df %>% group_by(by,variable) %>% 
mutate(value=replace_na(value/sum(value),0)) %>% 
ggplot(aes(x=Var1,y=value,fill=variable)) + 
geom_col(position="dodge") + facet_wrap(~by) + 
scale_y_continuous(labels = scales::percent_format()) + 
coord_flip() 

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • You solved the graphing issue but this doesn't work for me. It's incorrectly calculating the percentages and calculating NaN where there are zeros with your dplyr code. ``` 5 Neither agree nor disagree Cluster 1 (n = 9) A 2 0.333 6 Neither agree nor disagree Cluster 2 (n = 15) A 0 NaN ``` – Fully Aquatic Feb 14 '20 at 23:10
  • I see, there are clusters where it's all zeros. You can replace the NAs with zeros. which percentage are you calculating? It's not very clear from your question. I.e percentage within 1 or 2, or percentage within strong agree etc, nested within 1 or 2? – StupidWolf Feb 14 '20 at 23:23
  • percentage of responses to each category within each cluster (i.e. if you look at the first figure with just the number of responses, there are 6/9 strongly agrees for cluster 1 which i would like to display as 66% of responses. – Fully Aquatic Feb 14 '20 at 23:33
  • Then it's group by cluster and variable.. hey man this is really not clear from your question – StupidWolf Feb 14 '20 at 23:42