1

I have some data that I want to display as a box plot using ggplot2. It's basically counts, stratified by two other variables. Here's an example of the data (in reality there's a lot more, but the structure is the same):

TAG Count Condition
A     5         1
A     6         1
A     6         1
A     6         2
A     7         2
A     7         2
B     1         1
B     2         1
B     2         1
B    12         2
B     8         2
B    10         2
C    10         1
C    12         1
C    13         1
C     7         2
C     6         2
C    10         2

For each Tag, there are a fixed number of observations in condition 1, and condition 2 (here it's 3, but in the real data it's much more). I want a box plot like the following ('s' is a dataframe arranged as above):

ggplot(s, aes(x=TAG, y=Count, fill=factor(Condition))) + geom_boxplot()

Plot of example data

This is fine, but I want to be able to order the x-axis by the p-value from a Wilcoxon test for each Tag. For example, with the above data, the values would be (for the tags A,B, and C respectively):

> wilcox.test(c(5,6,6),c(6,7,7))$p.value
[1] 0.1572992
> wilcox.test(c(1,2,2),c(12,8,10))$p.value
[1] 0.0765225
> wilcox.test(c(10,12,13),c(7,6,10))$p.value
[1] 0.1211833

Which would induce the ordering A,C,B on the x-axis (largest to smallest). But I don't know how to go about adding this information into my data (specifically, attaching a p-value at just the tag level, rather than adding a whole extra column), or how to use it to change the x-axis order. Any help greatly appreciated.

Philip Uren
  • 356
  • 1
  • 5
  • 14
  • possible duplicate of [Order Bars in ggplot2 bar graph](http://stackoverflow.com/questions/5208679/order-bars-in-ggplot2-bar-graph) – joran Mar 29 '12 at 22:23
  • I know that other question is about bar graphs, but it's really the same question, with the same solution: make sure `TAG` is an ordered factor. – joran Mar 29 '12 at 22:24
  • Two questions in one. Also a possible duplicate of [sorting-of-categorical-variables-in-ggplot](http://stackoverflow.com/questions/5916779/sorting-of-categorical-variables-in-ggplot) – Etienne Low-Décarie Mar 30 '12 at 11:43
  • Thanks for the pointers, though those other questions (unless I'm mistaken) don't seem to be about ordering using a computed statistic, but rather using information already in the data-frame. It may seem like a slim difference, but that was a substantial part of what I was looking for. In fact, I figured out how to achieve the result I wanted by computing the p-values before-hand, manually attaching them to the data and then ordering the TAG factor by that column. Ramnath's answer is substantially better than what I came up with though. – Philip Uren Apr 02 '12 at 18:19

1 Answers1

1

Here is a way do it. The first step is to calculate the p-values for each TAG. We do this by using ddply which splits the data by TAG, and calculates the p-value using the formula interface to wilcox.test. The plot statement reorders the TAG based on its p-value.

library(ggplot2); library(plyr)
dfr2 <- ddply(dfr, .(TAG), transform, 
  pval = wilcox.test(Count ~ Condition)$p.value)

qplot(reorder(TAG, pval), Count, fill = factor(Condition), geom = 'boxplot', 
  data = dfr2)

enter image description here

Ramnath
  • 54,439
  • 16
  • 125
  • 152