6

I have columnar data set that I am plotting a series of box plots with, most similar to the setup in this example: Boxplot of table using ggplot2

require(reshape2)
ggplot(data = melt(dd), aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable))

However, in my case, each of the boxplots represents a different number of data points. For example, Column A might have 8000 data points, Column B might have 6000, Column C might have 2500, and Column D might have 800.

To help communicate this, I thought I could alpha the fill color of the box to reflect the number of datapoints. The darker the box, the more datapoints were used in computing the statistics the boxplot represents.

In the ggplot2 help file for geom_histogram, they use aes(fill=..count..) to shade the bins corresponding to the # of counts in the bin.

m <- ggplot(movies, aes(x=rating))    
m + geom_histogram(aes(fill=..count..))

(Wanted to include a picture of the example histogram here, but can't because I don't have enough reputation points...sorry)

I tried using this with my ggplot geom_boxplot, but it doesn't seem to know the ..count.. part. Here is my line that is generating the boxplot:

ggplot(meltedData, aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable), outlier.size = 1) + ylim(-4,3)

Anyone have any pointers? I know I can add the "alpha" property to geom_boxplot, but how can I apply it to each boxplot individually based on the # of datapoints in the boxplot?

Thanks in advance.

Community
  • 1
  • 1
Saket Vora
  • 61
  • 1
  • 2
  • could you please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of the columns you're trying to plot? – canary_in_the_data_mine Jul 16 '13 at 18:04
  • I don't know the whole `..count..` system very well, but I think it works with histograms because of the `stat="bin"` argument. You may have to just add `count` to the data itself. – Señor O Jul 16 '13 at 18:08

3 Answers3

7

stat_boxplot doesn't calculate the count. Just do it outside of ggplot2:

library(plyr)
DF <- ddply(mtcars, .(cyl), transform, myalpha = length(cyl))

library(ggplot2)
ggplot(DF, aes(factor(cyl), mpg)) + 
  geom_boxplot(aes(alpha = myalpha), fill = "blue") 

enter image description here

Roland
  • 127,288
  • 10
  • 191
  • 288
4

My version of Roland's solution using dplyr package:

library(dplyr)
library(ggplot2)

df <- mtcars %>%
  group_by(cyl) %>%
  mutate(my_alpha = length(cyl))

ggplot(df, aes(factor(cyl), mpg)) +
  geom_boxplot(aes(alpha = my_alpha), fill = 'blue')
Tiana
  • 41
  • 3
1

data.table option:

dd <- data.table(dd)
dd[,Count:=.N,by=variable]
Señor O
  • 17,049
  • 2
  • 45
  • 47
  • Sure. What do you mean by "at least"? – Señor O Jul 16 '13 at 18:13
  • I just don't see the need to list all possibilities to do this everytime split-apply-combine is needed in an answer. We really need a good FAQ giving all possibilities. I chose `plyr` here because I was already in the hadleyverse. – Roland Jul 16 '13 at 18:16