9

I am creating boxplots using ggplot and would like to represent the sample size contributing to each box. In the base plot function there is the varwidth option. Does it have an equivalent in ggplot?

For example, in base plot

data <- data.frame(rbind(cbind(rnorm(700, 0,10), rep("1",700)),
                         cbind(rnorm(50, 0,10), rep("2",50))))
data[ ,1] <- as.numeric(as.character(data[,1]))
plot(data[,1] ~ as.factor(data[,2]), varwidth = TRUE)

enter image description here

N Brouwer
  • 4,778
  • 7
  • 30
  • 35
  • 3
    I seem to recall someone asking this on the mailing list quite a while ago and they were told it wasn't possible. I don't see anything referencing this in the issues on github, so it might still not be possible. (An alternative is to use fill colors.) – joran Sep 28 '12 at 21:57
  • Not possible with ggplot, if you're only generating one plot you could possibly modify it in Illustrator or something similar – Omar Wagih Sep 29 '12 at 04:32
  • 1
    @joran I have learnt from bitter experience that calling anything in R impossible just serves as bait for someone to prove you wrong. In this case the migthy @ kohske provided a workaround. – Andrie Sep 29 '12 at 06:02
  • How many points do you have per boxplot? – Roman Luštrik Sep 29 '12 at 07:32
  • 2
    This has now been implemented with the `varwidth` argument. See this question: http://stackoverflow.com/q/25171210/3897439 – Cotton.Rockwood Oct 20 '14 at 19:44

2 Answers2

7

Not elegant but you can do that by:

data <- data.frame(rbind(cbind(rnorm(700, 0,10), rep("1",700)),
                         cbind(rnorm(50, 0,10), rep("2",50))))
data[ ,1] <- as.numeric(as.character(data[,1]))
w <- sqrt(table(data$X2)/nrow(data))
ggplot(NULL, aes(factor(X2), X1)) + 
  geom_boxplot(width = w[1], data = subset(data, X2 == 1)) +
  geom_boxplot(width = w[2], data = subset(data, X2 == 2))

enter image description here

If you have several levels for X2, then you can do without hardcoding all levels:

ggplot(NULL, aes(factor(X2), X1)) + 
  llply(unique(data$X2), function(i) geom_boxplot(width = w[i], data = subset(data, X2 == i)))

Also you can post a feature request: https://github.com/hadley/ggplot2/issues

kohske
  • 65,572
  • 8
  • 165
  • 155
2

The current versions of ggplot2 (V 2.1.0) now contains a varwidth option:

data <- data.frame(rbind(cbind(rnorm(700, 0,10), rep("1",700)),
                     cbind(rnorm(50, 0,10), rep("2",50))))
data$X1 <- as.numeric(as.character(data$X1))
ggplot(data = data, aes(x = X2, y = X1)) + 
    geom_boxplot(varwidth = TRUE) 

Example output plot from ggplot2

Richard Erickson
  • 2,568
  • 8
  • 26
  • 39