How to get clustered Stack bar in R?

Question

I have a data set with the following data(say):

n=50

df = data.frame(id =c(seq(1,n),seq(1,n)), pre_post = c(rep(0,n),rep(1,n)), q1 = sample(1:5,2*n, replace = TRUE), q2 = sample(1:5,2*n, replace = TRUE),q3 = sample(1:5,2*n, replace = TRUE),q4 = sample(1:5,2*n, replace = TRUE))

df$pre_post = as.factor(df$pre_post)
df$q1 = as.factor(df$q1)
df$q2 = as.factor(df$q2)
df$q3 = as.factor(df$q3)
df$q4 = as.factor(df$q4)

head(df)

I want a graph such that all the questions should be in x axis and stacks should be number of people who have responded as 1, 2, ...5 for pre and post.

How to achieve this?

I have 10 such questions and I need to plot it in a single graph.

Typically wanted to compare the frequency of each factor for each question in pre and post.

What i have done?

melted = melt(df, id.vars = c('id','pre_post'))

ggplot(melted, aes(x = pre_post, y =id , fill = value)) + 
  geom_bar(stat = 'identity', position = 'stack') + facet_grid(~variable)

And this gave me the following plot. But this graph seems to be not correct. Where am I wrong?

Might need facets: `https://stackoverflow.com/questions/47085795/clustered-and-stacked-bar-plot-with-multiple-csv-files` — Mike H., Apr 16 '18 at 19:18
"But this graph seems to be not correct." What about it isn't correct? What do you expect or need it to look like? — camille, Apr 16 '18 at 19:22
number of observation is 50 only know..but it shows more. @camille — David, Apr 16 '18 at 19:28
`y` shouldnt be `id`. Leave `y` blank. Also remove `stat = "identity"` since you want count data — Mike H., Apr 16 '18 at 19:30
yeah..Thats working but how to show that 50 patients in y axis?? should I give ticks??Otherwise atleast I want to put the frequencies in the stacks @MikeH. — David, Apr 16 '18 at 19:31
Without the `y` and `stat` set (as suggested by @Mike H.), the y-axis shows you counts from 0 to 50, corresponding to your 50 patients, with ticks per 10. Is that not what you wanted? — 4rj4n, Apr 16 '18 at 20:05
if i remove `stat` it gives me the following error. `Error: stat_count() must not be used with a y aesthetic.` @4rj4n — David, Apr 16 '18 at 20:42
I have kept `y=' '` and tried both removing `stat =' identity'` and keeping something like `stat = ' '` @LukeC — David, Apr 17 '18 at 02:55
And can anyone explain me that why we should remove `y =id`? — David, Apr 17 '18 at 02:56
@gloom I see- try `ggplot(melted, aes(x = pre_post, fill = value)) + geom_bar(position = 'stack') + facet_grid(~variable)`. There is no `y` argument at all, in this case, because you are simply wanting (as I understand it) a count of the `value` column at each `x` in each `variable`. — Luke C, Apr 17 '18 at 03:20
Luke C., and originally Mike H., is correct. If you want counts, you don't need to (or even shouldn't) specify the `y` argument. If you have a look at the documentation for bar charts at [tidyverse](http://ggplot2.tidyverse.org/reference/geom_bar.html), they state that geom_bar is designed to make it easy to create bar charts that show counts, and you can see in the example graphs and the corresponding code, that there are no `y` arguments either when coding counts graphs. — 4rj4n, Apr 17 '18 at 08:48

score 1 · Accepted Answer · answered Apr 17 '18 at 12:41

As people have mentioned in the comments, geom_bar is designed to work with no y input. Having y = id means you're setting the y value to be the sum of all your IDs, which isn't what you want. geom_bar uses stat_count, rather than stat_identity, to do a count behind the scenes and then map that to your y values.

So you can keep everything really simple---no y, no stat---and let geom_bar set that up for you.

library(ggplot2)
library(reshape2)

n=50

df = data.frame(
    id =c(seq(1,n),seq(1,n)), 
    pre_post = c(rep(0,n),rep(1,n)), 
    q1 = sample(1:5,2*n, replace = TRUE), 
    q2 = sample(1:5,2*n, replace = TRUE),
    q3 = sample(1:5,2*n, replace = TRUE),
    q4 = sample(1:5,2*n, replace = TRUE)
)

df$pre_post = as.factor(df$pre_post)
df$q1 = as.factor(df$q1)
df$q2 = as.factor(df$q2)
df$q3 = as.factor(df$q3)
df$q4 = as.factor(df$q4)


melted <- melt(df, id.vars = c('id','pre_post'))


ggplot(melted, aes(x = pre_post, fill = value)) +
    geom_bar(position = "stack") +
    facet_grid(~ variable)

I made this second example since you mentioned showing the frequencies of each answer. You can use a call to stat_count to make a text geom for labels. Note that calc(count) is the newer replacement for ..count.., although the new syntax might only be in the github version of ggplot2.

ggplot(melted, aes(x = pre_post, fill = value)) +
    geom_bar(position = "stack") +
    stat_count(aes(label = calc(count)), geom = "text", position = position_stack(vjust = 0.5)) +
    facet_grid(~ variable)

Created on 2018-04-17 by the reprex package (v0.2.0).

Oh...Superb...+1 for counts in the stacks. @camille – David Apr 18 '18 at 08:53 — David, Apr 18 '18 at 08:53

How to get clustered Stack bar in R?

How to achieve this?

What i have done?

1 Answers1