1

I want to add count statistics to multiple boxplots for multiple columns with ggplot2, I tried of using tally from dplyr but count statistics is not right. How can I make it right? any quick idea to do this?

reproducible data and my attempt:

here is reproducible data and my attempt down below. Basically, I want to add count statistics like tot_#_Tool_A, tot_#_Tool_B, tot_# in each subplot. How can I do that in R? any quick idea to make this happen? thanks

ID <- c("DJ45","DJ46","DJ47","DJ48","DJ49","DJ53","DJ54","DJ55","DJ56","DJ57")
Tool <- c("Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_B", "Tool_B", "Tool_B", "Tool_B", "Tool_B")
Name <- c("CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP")
MS1 <- c(51,55,50,59,50,47,48,42,43,46)
MS2 <- c(13,11,14,11,10,17,18,17,20,21)
MS3 <- c(2,3,2,5,6,4,9,6,4,4)
MS4 <- c(16,13,14,11,16,16,18,16,19,15)
MS5 <- c(3,6,3,6,3,4,4,8,5,4)
MS6 <- c(7,7,5,5,8,9,8,6,6,9)

df <- data.frame(ID,Tool,Name,MS1,MS2,MS3,MS4,MS5,MS6)

my updated attempt:

library(reshape2)
library(dplyr)

df1_long <- melt(df, id.vars=c("ID","Tool"))
df1_long %>% group_by(Tool, variable)%>%
    tally %>% ungroup %>% as.data.frame() %>%
    setNames(c("tool", "cat_vars", "count")) %>%
    {
        bind_rows(., setNames(., c("tool", "cat_vars", "count")))
    } %>% as.data.frame() %>%
    ggplot(aes(x=tool,y=count,fill=tool))+
    geom_boxplot() + labs(title="CMP") +facet_wrap(~variable)

but I didn't get correct expected boxplot where expecting count statistics didn't show up. Any idea to make this work? what's the issue in my code? any thoughts? thanks

goal:

I want to add count statistics like tot_#_Tool_A, tot_#_Tool_B, tot_# in each subplot. any idea?

desired output:

I am trying to get plot something like this post, whereas tot_#_Tool_A, tot_#_Tool_B, tot_# should be placed on the top of each subplot. How can I make this happen? thanks

beyond_inifinity
  • 443
  • 13
  • 29

1 Answers1

1

I am guessing tot_#_Tool_A is total of the values for A , tot_#_Tool_B is total for B and tot_# is the grand total. Tally() doesn't work because it just counts the number of entries, which doesn't make sense for you because you have 5 entries per combination.

So first we can get counts and total per subcategory. In theory we can keep them under one data frame, but for the sake of annotation and easy solution, let's keep them separate:

library(tidyr)
library(dplyr)
library(ggplot2)

counts = df %>% 
pivot_longer(MS1:MS6) %>% 
group_by(Tool,name) %>% 
summarize(pos=max(value)+2,value=sum(value))

totalcounts = counts %>% 
group_by(name) %>% 
summarise(pos=max(pos)+5,value=paste("total=",sum(value)))

df %>% pivot_longer(MS1:MS6) %>% 
ggplot(aes(x=Tool,y=value)) + 
geom_boxplot() + facet_wrap(~name) + 
geom_text(data=counts,aes(y=pos,label=value)) +
geom_text(data=totalcounts,aes(x=1.5,y=pos,label=value),col="blue")

enter image description here

In the solution above, I used the max of every sub category to specify the position of the text, and I placed total in the middle of it of all. You can play around with where to position them.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • another thought, if I need just counting, for instance, it should be tot_#MS1_Tool_A=5, tot_#_MS1_Tool_B=5, tot_#_A_B = 10. I tried to using `count` from plyr, is that the correct way to do? thanks for your help – beyond_inifinity Mar 27 '20 at 14:12
  • I also tried of `summarize(count = n())`, but I got error of : `Error in FUN(X[[i]], ...) : object 'value' not found` error. – beyond_inifinity Mar 27 '20 at 14:42
  • I also changed `summarize(pos=max(value)+2,value=count(value))` to summarize(pos=count(value)+2,value=count(value))`, but I am having an error? any idea? thanks – beyond_inifinity Mar 27 '20 at 14:55
  • can you try this ? df %>% pivot_longer(MS1:MS6) %>% count(Tool,name). Sorry it's a bit hard to follow comments like this sometimes – StupidWolf Mar 27 '20 at 15:05
  • I think that should work, but when I did like this: `df%>% pivot_longer(MS1:MS6) %>% count(Tool,name) %>% ggplot(aes(x=Tool,y=value)) + geom_boxplot() + facet_wrap(~name) + geom_text(data=n,aes(y=pos,label=value))`, which throwed up an error. why? – beyond_inifinity Mar 27 '20 at 15:20
  • 1
    because you are passing the result of df%>% pivot_longer(MS1:MS6) %>% count(Tool,name) to ggplot, and this is a count summary, where the column value doesn't exist – StupidWolf Mar 27 '20 at 15:21
  • https://r4ds.had.co.nz/pipes.html i think it helps to familiarize yourself with pipes before doing more advanced stuff – StupidWolf Mar 27 '20 at 15:23