7

My dataset:

I have data in the following format (here, imported from a CSV file). You can find an example dataset as CSV here.

PAIR   PREFERENCE
1      5
1      3
1      2
2      4
2      1
2      3

… and so on. In total, there are 19 pairs, and the PREFERENCE ranges from 1 to 5, as discrete values.


What I'm trying to achieve:

What I need is a stacked histogram, e.g. a 100% high column, for each pair, indicating the distribution of the PREFERENCE values.

Something similar to the "100% stacked columns" in Excel, or (although not quite the same, a so-called "mosaic plot"):


What I tried:

I figured it'd be easiest using ggplot2, but I don't even know where to start. I know I can create a simple bar chart with something like:

ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")

… that however doesn't get me very far. So I tried this, and it gets me somewhat closer to what I'm trying to achieve, but it still uses the count of PREFERENCE, I suppose? Note the ylab being "count" here, and the values ranging to 19.

qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))

Results in:

enter image description here

  • So, what do I have to do to get the stacked bars to represent a histogram?
  • Or do they actually do this already?
  • If so, what do I have to change to get the labels right (e.g. have percentages instead of the "count")?

By the way, this is not really related to this question, and only marginally related to this (i.e. probably same idea, but not continuous values, instead grouped into bars).

Community
  • 1
  • 1
slhck
  • 36,575
  • 28
  • 148
  • 201
  • 2
    You mean something like this? http://stackoverflow.com/questions/3619067/stacked-bar-chart-in-r-ggplot2-with-y-axis-and-bars-as-percentage-of-counts – Roman Luštrik Jan 06 '12 at 12:38
  • @RomanLuštrik It's very similar, but the solution, adapted with my two variables, [outputs something weird](http://i.stack.imgur.com/2AJy2.png). Do you have any idea on how to proceed? I guess I'm almost there, I (probably) only need to change the scale to percentage. – slhck Jan 06 '12 at 13:02
  • A nit: this is not at all a histogram, and in fact bar charts are not histograms. That said, if the total 'count' for each factor is in fact 19, then yes, you're done. replace count with, say `ctpercent<-100*count/19` to get the desired y-values. – Carl Witthoft Jan 06 '12 at 15:37
  • @CarlWitthoft Well, I know the terminology is a bit off. Where exactly should I put `ctpercent<-100*count/19` in the `qplot` command? – slhck Jan 06 '12 at 15:40

1 Answers1

10

Maybe you want something like this:

ggplot() + 
    geom_bar(data = dat,
             aes(x = factor(PAIR),fill = factor(PREFERENCE)),
             position = "fill")

where I've read your data into dat. This outputs something like this:

enter image description here

The y label is still "count", but you can change that manually by adding:

+ scale_x_discrete("Pairs") + scale_y_continuous("Votes")
slhck
  • 36,575
  • 28
  • 148
  • 201
joran
  • 169,992
  • 32
  • 429
  • 468