3

I want to plot the p value of Kruskal-Wallis test to my ggplot using the R function stat_compare_means from the package ggpubr.

However, the plotted value is different from the value if I simply run the function:

kruskal.test(value ~ type, data = Profile_melt)

my code to plot the p value is:

ggplot(Profile_melt, aes(type, value)) + 
  geom_boxplot(aes(fill = factor(type), alpha = 0.5), 
               outlier.shape = NA, show.legend = FALSE) +
  geom_jitter(width = 0.2, size = 2, show.legend = FALSE,
              aes(colour = factor(type)), alpha = 0.5) +
  theme_bw() +
  facet_grid(Case ~ Marker, scales = 'free') +
  stat_compare_means(comparison = list(c("Real", "Binomial")),method = 'kruskal.test')+
  background_grid(major = 'y', minor = "none") + # add thin horizontal lines 
  xlab('Category') +
  ylab('Cell counts (Frequencies)')+
  theme(axis.text = element_text(size = 15), 
        axis.title = element_text(size = 20), 
        legend.text = element_text(size = 38),
        legend.title = element_text(size = 30), 
        strip.background = element_rect(colour="black", fill="white"),
        strip.text = element_text(margin = margin(10, 10, 10, 10), size = 25)) +
  panel_border()

Here is my data sample data

TobiO
  • 1,335
  • 1
  • 9
  • 24
DigiPath
  • 179
  • 2
  • 10

2 Answers2

2

There are many code lines which may not be relevant to the question. Perhaps, your question could be:

why does

kruskal.test(value ~ type, data = Profile_melt)

#Kruskal-Wallis chi-squared = 4.9673, df = 1, p-value = 0.02583

produce a different p value from

ggboxplot(Profile_melt, x="type", y = "value") + 
  stat_compare_means(comparison = list(c("Real", "Binomial")), method = 'kruskal.test')

# p-value = 0.49

You could work out the reason by checking original code. The developer of ggpubr may explain this better, and perhaps fix it there if it is an issue. To get correct and consistent p value, remove comparison = list(c("Real", "Binomial")):

ggboxplot(Profile_melt, x="type", y = "value") + 
  stat_compare_means(method = 'kruskal.test')

or

Edit

ggboxplot(Profile_melt, x="type", y = "value") + 
  stat_compare_means(comparison = list(c("Real", "Binomial")))

With your other code, the graph looks like this:

enter image description here

Community
  • 1
  • 1
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
  • Hi Zhiqiang, thank you for your reply. Remove this comparison = ... line do produce the consistent result, however, the format of the plot will also be altered, which is not what I want. – DigiPath Dec 26 '19 at 23:58
  • I see. You want to have the nice looking horizontal line. You can achieve that by remove `method = 'kruskal.test'` instead of `comparison = list(c("Real", "Binomial")`. I will revise my answer. – Zhiqiang Wang Dec 27 '19 at 00:25
  • Please run the full codes that I provide. Your suggestion does not work when in facet. – DigiPath Dec 27 '19 at 00:29
  • I have run the code on my side. It works in the same way with `facet`, because you have small sample, it produces some warnings `cannot compute exact p-value with ties`. As I said, the real fix is perhaps to change `ggpubr`. – Zhiqiang Wang Dec 27 '19 at 00:49
  • hummm that's werid...could you please provide the full codes on your side(with facet)? Thank you – DigiPath Dec 27 '19 at 00:50
  • I did not change any your other code, just removed either `comparison = list(c("Real", "Binomial"))` or `method = 'kruskal.test'`. – Zhiqiang Wang Dec 27 '19 at 00:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204913/discussion-between-digipath-and-zhiqiang-wang). – DigiPath Dec 27 '19 at 00:58
1

stat_compare_means from ggpubr calls compare_means which uses wilcox.test by default. So as @ZhiqiangWang pointed out, if you remove the method or the comparison it goes to the default which is similar to the p-value you got in the first place, because wilcoxon and kruskal for 2 sample are very similar:

kruskal.test(value ~ type, data = Profile_melt)
#Kruskal-Wallis chi-squared = 4.9673, df = 1, p-value = 0.02583
wilcox.test(value ~ type, data = Profile_melt)
#W = 1034939, p-value = 0.02583

Now, for the data you have, you most likely want a p-value for each of the separate case and marker, and not a pan comparison using kruskal.test(value ~ type, data = Profile_melt). It doesn't make sense to print a same p-value for all the facets.

We first check the p-values we need:

compare_means(value ~ type, Profile_melt, group.by = c("Case","Marker"),
method="kruskal")
# A tibble: 30 x 8
   Case    Marker .y.            p   p.adj p.format p.signif method        
   <fct>   <fct>  <chr>      <dbl>   <dbl> <chr>    <chr>    <chr>         
 1 Case 1A CD3    value 0.000470   0.0085  0.00047  ***      Kruskal-Wallis
 2 Case 1A CD4    value 0.00000915 0.00022 9.2e-06  ****     Kruskal-Wallis
 3 Case 1A CD8    value 0.00695    0.09    0.00695  **       Kruskal-Wallis
 4 Case 1A CD20   value 0.707      1       0.70724  ns       Kruskal-Wallis
 5 Case 1A FoxP3  value 0.00102    0.014   0.00102  **       Kruskal-Wallis
 6 Case 1B CD3    value 0.0000415  0.00091 4.1e-05  ****     Kruskal-Wallis

which is similar to:

Profile_melt %>% 
group_by(Case,Marker) %>% 
summarize(k_p=kruskal.test(value ~ type)$p.value)

# A tibble: 30 x 3
# Groups:   Case [6]
   Case    Marker        k_p
   <fct>   <fct>       <dbl>
 1 Case 1A CD3    0.000470  
 2 Case 1A CD4    0.00000915
 3 Case 1A CD8    0.00695   
 4 Case 1A CD20   0.707     
 5 Case 1A FoxP3  0.00102   

And we can plot, it's must easier to use the ggboxplot from ggpubr package:

p = ggboxplot(Profile_melt,x="type",y="value",add="jitter",
facet.by=c("Case","Marker"),scales="free_y",ggtheme=theme_pubclean())

p+stat_compare_means(
aes(label =paste("p=",scientific(as.numeric(..p.format..)))),
method="kruskal",size=2)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you so much! That's exactly what I want. Just one more thing, I want the horizontal compare line in each facet and I want to unify the number format to scientific notation (1.23e-3, etc). How could I do this? – DigiPath Dec 27 '19 at 16:50
  • I guess you mean the grid lines, for that you have to either choose a theme using the ggtheme option in ggboxplot, or you set it using + theme(..) . As for scientific, you can see what I have above, unfortunately ggpubr already has it at 1 significant figure, so I cannot do much about it – StupidWolf Dec 27 '19 at 17:46
  • This is a bit outside the scope of your question, which is about p-values. So if you need a lot more tinkering with refining the plot, I would suggest you post it as a separate question.. – StupidWolf Dec 27 '19 at 17:48