0

I have a data set with the following data(say):

n=50

df = data.frame(id =c(seq(1,n),seq(1,n)), pre_post = c(rep(0,n),rep(1,n)), q1 = sample(1:5,2*n, replace = TRUE), q2 = sample(1:5,2*n, replace = TRUE),q3 = sample(1:5,2*n, replace = TRUE),q4 = sample(1:5,2*n, replace = TRUE))

df$pre_post = as.factor(df$pre_post)
df$q1 = as.factor(df$q1)
df$q2 = as.factor(df$q2)
df$q3 = as.factor(df$q3)
df$q4 = as.factor(df$q4)

head(df)

enter image description here

I want a graph such that all the questions should be in x axis and stacks should be number of people who have responded as 1, 2, ...5 for pre and post.

How to achieve this?

I have 10 such questions and I need to plot it in a single graph.

Typically wanted to compare the frequency of each factor for each question in pre and post.

What i have done?

melted = melt(df, id.vars = c('id','pre_post'))

ggplot(melted, aes(x = pre_post, y =id , fill = value)) + 
  geom_bar(stat = 'identity', position = 'stack') + facet_grid(~variable)

And this gave me the following plot. But this graph seems to be not correct. Where am I wrong?

enter image description here

David
  • 524
  • 1
  • 7
  • 24
  • Might need facets: `https://stackoverflow.com/questions/47085795/clustered-and-stacked-bar-plot-with-multiple-csv-files` – Mike H. Apr 16 '18 at 19:18
  • 1
    "But this graph seems to be not correct." What about it isn't correct? What do you expect or need it to look like? – camille Apr 16 '18 at 19:22
  • number of observation is 50 only know..but it shows more. @camille – David Apr 16 '18 at 19:28
  • 2
    `y` shouldnt be `id`. Leave `y` blank. Also remove `stat = "identity"` since you want count data – Mike H. Apr 16 '18 at 19:30
  • yeah..Thats working but how to show that 50 patients in y axis?? should I give ticks??Otherwise atleast I want to put the frequencies in the stacks @MikeH. – David Apr 16 '18 at 19:31
  • Without the `y` and `stat` set (as suggested by @Mike H.), the y-axis shows you counts from 0 to 50, corresponding to your 50 patients, with ticks per 10. Is that not what you wanted? – 4rj4n Apr 16 '18 at 20:05
  • if i remove `stat` it gives me the following error. `Error: stat_count() must not be used with a y aesthetic.` @4rj4n – David Apr 16 '18 at 20:42
  • Did you remove the `y = id` as suggested by Mike H? – Luke C Apr 16 '18 at 21:26
  • I have kept `y=' '` and tried both removing `stat =' identity'` and keeping something like `stat = ' '` @LukeC – David Apr 17 '18 at 02:55
  • And can anyone explain me that why we should remove `y =id`? – David Apr 17 '18 at 02:56
  • 1
    @gloom I see- try `ggplot(melted, aes(x = pre_post, fill = value)) + geom_bar(position = 'stack') + facet_grid(~variable)`. There is no `y` argument at all, in this case, because you are simply wanting (as I understand it) a count of the `value` column at each `x` in each `variable`. – Luke C Apr 17 '18 at 03:20
  • Luke C., and originally Mike H., is correct. If you want counts, you don't need to (or even shouldn't) specify the `y` argument. If you have a look at the documentation for bar charts at [tidyverse](http://ggplot2.tidyverse.org/reference/geom_bar.html), they state that geom_bar is designed to make it easy to create bar charts that show counts, and you can see in the example graphs and the corresponding code, that there are no `y` arguments either when coding counts graphs. – 4rj4n Apr 17 '18 at 08:48

1 Answers1

1

As people have mentioned in the comments, geom_bar is designed to work with no y input. Having y = id means you're setting the y value to be the sum of all your IDs, which isn't what you want. geom_bar uses stat_count, rather than stat_identity, to do a count behind the scenes and then map that to your y values.

So you can keep everything really simple---no y, no stat---and let geom_bar set that up for you.

library(ggplot2)
library(reshape2)

n=50

df = data.frame(
    id =c(seq(1,n),seq(1,n)), 
    pre_post = c(rep(0,n),rep(1,n)), 
    q1 = sample(1:5,2*n, replace = TRUE), 
    q2 = sample(1:5,2*n, replace = TRUE),
    q3 = sample(1:5,2*n, replace = TRUE),
    q4 = sample(1:5,2*n, replace = TRUE)
)

df$pre_post = as.factor(df$pre_post)
df$q1 = as.factor(df$q1)
df$q2 = as.factor(df$q2)
df$q3 = as.factor(df$q3)
df$q4 = as.factor(df$q4)


melted <- melt(df, id.vars = c('id','pre_post'))


ggplot(melted, aes(x = pre_post, fill = value)) +
    geom_bar(position = "stack") +
    facet_grid(~ variable)

I made this second example since you mentioned showing the frequencies of each answer. You can use a call to stat_count to make a text geom for labels. Note that calc(count) is the newer replacement for ..count.., although the new syntax might only be in the github version of ggplot2.

ggplot(melted, aes(x = pre_post, fill = value)) +
    geom_bar(position = "stack") +
    stat_count(aes(label = calc(count)), geom = "text", position = position_stack(vjust = 0.5)) +
    facet_grid(~ variable)

Created on 2018-04-17 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60