1

To use a simple example, let's say:

A = rnorm(10)
B = rnorm(100)
C = rnorm(500)

library(vioplot)
vioplot(A,B,C)

My question is thus how to create such a graph that takes into account the sample size. 'C' has a much higher sample size than 'A', is there a way where the violin plot for 'C' can show a "bigger" violin than 'A'? Thus this would be density distribution that goes across the three classes I suppose, thus even though the entire distribution shape of 'A' and 'C' may be equal, rather than showing identical images they show 'A' being of smaller shape stature than 'C' and 'B' as well due to its smaller sample size.

  • Hi there! Please make your post reproducible by having a look at [**How to make a great reproducible example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for us to help you. Thank you. – Arun Apr 05 '13 at 21:17
  • 2
    There is a good answer and discussion of this over on [CrossValidated][1] [1]: http://stats.stackexchange.com/questions/13555/how-to-scale-violin-plots-for-comparisons – Stedy Apr 05 '13 at 21:35
  • 1
    Thanks Arun, I've edited my question in accordance with what you mentioned. – user2250706 Apr 05 '13 at 21:42
  • Hi Stedy, thanks for the link, I would say the Weighted Areas is what I'm looking for. Do you happen to know the code for that since it doesn't appear to be listed on the page? – user2250706 Apr 05 '13 at 21:57

2 Answers2

1

Sizes will not be different because violin plot is a combination of boxplot and density/probability plot.

Here is short example:

library(ggplot2)

a) same size

df.ex<-data.frame(G=c(rep('A',100),rep('B',100)),Y=c(rnorm(100),rnorm(100)))
ggplot(data=df.ex,aes(x=G,y=Y)) + geom_violin()

b) different size

df.ex<-data.frame(G=c(rep('A',100),rep('B',1000)),Y=c(rnorm(100),rnorm(1000)))
ggplot(data=df.ex,aes(x=G,y=Y)) + geom_violin()

You can combine it with geom_jitter that will show you how many points is there:

 ggplot(data=df.ex,aes(x=G,y=Y)) + geom_jitter() + geom_violin()
Maciej
  • 3,255
  • 1
  • 28
  • 43
  • Thanks Miciej, the info about implementing geom_jitter was very helpful. There was a link included in one of the comments above from another user that seems quite interesting as well. – user2250706 Apr 05 '13 at 21:56
1

It's unfortunate that vioplot doesn't accept vectors for some of its parameters. Here's a workaround. The helpful features in vioplot() for this workaround are the at and wex parameters along with add=T. Basically plot each violin individually with the parameters that shape them the way you want. You may need to make adjustments to the way you scale the sample size for use with wex.

n<-c(100,1000)
size<-scale(sqrt(n),center=F)

x1<-rnorm(n[1])
x2<-rnorm(n[2])

#initialize an empty plot
plot(0:3,rep(0,4),type='l',xlim=c(0,3),ylim=c(-4,4),ylab="",xlab="",xaxt="n",lty=3)

# fill in the violins at specific x locations using the `wex` parameter for size
vioplot(x1,at=1,wex=size[1],add=T,col="darkgray")
vioplot(x2,at=2,wex=size[2],add=T,col="darkgray")
axis(1,at=1:2,labels=c("Mon","Tues"))

enter image description here

ndoogan
  • 1,925
  • 12
  • 15