4

I have successfully created a very nice boxplot (for my purposes) categorized by a factor and binned, according to the answer in my previous post here: ggplot: arranging boxplots of multiple y-variables for each group of a continuous x

Now, I would like to customize the x-axis labels according to the number of observations in each boxplot.

require (ggplot2)
require (plyr)
library(reshape2)

set.seed(1234)
x<- rnorm(100)
y.1<-rnorm(100)
y.2<-rnorm(100)
y.3<-rnorm(100)
y.4<-rnorm(100)

df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4)))
dfmelt<-melt(df, measure.vars = 2:5)

dfmelt$bin <- factor(round_any(dfmelt$x,0.5))

dfmelt.sum<-summary(dfmelt$bin)    

ggplot(dfmelt, aes(x=bin, y=value, fill=variable))+
geom_boxplot()+
facet_grid(.~bin, scales="free")+
labs(x="number of observations")+
scale_x_discrete(labels= dfmelt.sum)

dfmelt.sum only gives me the total number of observations for each bin not for each boxplot. Boxplots statistics give me the number of observations for each boxplot.

dfmelt.stat<-boxplot(value~variable+bin, data=dfmelt)
dfmelt.n<-dfmelt.stat$n

But how do I add tick marks and labels for each boxplot?

Thanks, Sina

UPDATE

I have continued working on this. The biggest problem is that in the code above, only one tick mark is provided per facet. Since I also wanted to plot the means for each boxplot, I have used interaction to plot each boxplot individually, which also adds tick marks on the x-axis for each boxplot:

require (ggplot2)
require (plyr)
library(reshape2)

set.seed(1234) x<- rnorm(100)
y.1<-rnorm(100)
y.2<-rnorm(100)
y.3<-rnorm(100)
y.4<-rnorm(100)

df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4))) dfmelt<-melt(df, measure.vars = 2:5)

dfmelt$bin <- factor(round_any(dfmelt$x,0.5))

dfmelt$f2f1<-interaction(dfmelt$variable,dfmelt$bin)

dfmelt_mean<-aggregate(value~variable*bin, data=dfmelt, FUN=mean)
dfmelt_mean$f2f1<-interaction(dfmelt_mean$variable, dfmelt_mean$bin)

dfmelt_length<-aggregate(value~variable*bin, data=dfmelt, FUN=length)
dfmelt_length$f2f1<-interaction(dfmelt_length$variable, dfmelt_length$bin)

On the side: maybe there is a more elegant way to combine all those interactions. I'd be happy to improve.

ggplot(aes(y = value, x = f2f1, fill=variable), data = dfmelt)+
geom_boxplot()+
geom_point(aes(x=f2f1, y=value),data=dfmelt_mean, color="red", shape=3)+
facet_grid(.~bin, scales="free")+
labs(x="number of observations")+
scale_x_discrete(labels=dfmelt_length$value)

This gives me tick marks on for each boxplot which can be potentially labeled. However, using labels in scale_x_discrete only repeats the first four values of dfmelt_length$value in each facet.

How can that be circumvented? Thanks, Sina

Community
  • 1
  • 1
sina
  • 223
  • 2
  • 3
  • 7

1 Answers1

12

look at this answer, It is not on the label but it works - I have used this

Modify x-axis labels in each facet

You can also do as follows, I also have used that

    library(ggplot2)
df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
xlabs <- paste(levels(df$group),"\n(N=",table(df$group),")",sep="")
ggplot(df,aes(x=group,y=x,color=group))+geom_boxplot()+scale_x_discrete(labels=xlabs)

enter image description here

This also works

library(ggplot2) library(reshape2)

df <- data.frame(group=sample(c("a","b","c"),100,replace=T),x=rnorm(100),y=rnorm(100)*rnorm(100))
df1 <- melt(df)
df2 <- ddply(df1,.(group,variable),transform,N=length(group))
df2$label <- paste0(df2$group,"\n","(n=",df2$N,")")
ggplot(df2,aes(x=label,y=value,color=group))+geom_boxplot()+facet_grid(.~variable)

enter image description here

Community
  • 1
  • 1
user1617979
  • 2,370
  • 3
  • 25
  • 30
  • In your sample data, you have the same number of x and y values for each group. This is not the case for my sample data. I adapted your approach to my data but it still causes only the first four labels to be repeated for each facet. This is fine with your sample data, but in my case this produces wrong labels. – sina May 28 '14 at 09:11
  • I am not sure I understand, I spent some time looking at your data and added some lines: dfmelt <- ddply(dfmelt,.(bin,variable),transform,N=length(x)) dfmelt$label <- as.character(dfmelt$N) ggplot(aes(y = value, x = label, fill=variable), data = dfmelt)+ geom_boxplot()+stat_summary(fun.y=mean,geom="point", color="red", shape=3)+ facet_grid(.~bin, scales="free")+ labs(x="number of observations") – user1617979 May 28 '14 at 17:10
  • this works for me, it appears to me that for each bin you have the same number of observations for each boxplot. It is also not clear to mew if you want to count by bin or by bin and f2f1 (modify the ddply accordingly). Finally you do not need to calculate the mean before, see how I use stat_summary for that - I hope this helps – user1617979 May 28 '14 at 17:14
  • @user1617979 This does not work if facet 1 and facet 2 have a different number of observations for the groups. – Herman Toothrot Nov 03 '21 at 21:40