A good way for pairwise t-test for many rows? (R)

Question

I'm writing a code that, from a table of raw data (output of microbiome QIIME analysis), generates a t-test per group of all the rows. Every row has a bacteria and the values corresponding for every sample. The table can be huge, like 80 columns per 400 rows.

Group_phylum_data:

label_Group Bacteria_Firmicutes Archaea_Other Archaea_Euryarchaeota Bacteria_Other
HC       6.771703e-05             0           0.000000000   9.480385e-04
HC       3.362588e-05             0           0.016835356   5.604313e-05
HC       0.000000e+00             0           0.000000000   2.209945e-04
EPI       0.000000e+00             0           0.001121252   2.466755e-04
EPI       0.000000e+00             0           0.000000000   3.335038e-04

So now these are just the first lines with 2 groups (HC and EPI). I want to run a t-test for each bacteria in the columns among the groups. I've found this pairwise_t_test from the rstatix package and it does exactly what I want, returning also the adjusted p-value. Since the groups can be more than 2, I chose this pairwise_t_test because it can handle them and perform the stats for every combination.

pwc1 <- Group_phylum_data %>%
  pairwise_t_test(Bacteria_Firmicutes ~ label_Group, p.adjust.method = "bonferroni")
pwc1

The problem is that I can't find a way to make it a loop for entering each bacteria name and obtain a complete table with a bacteria per row and the stats in the corresponding columns, something like

.y.                    group1 group2    n1    n2     p p.signif p.adj p.adj.signif
  <chr>                  <chr>  <chr>  <int> <int> <dbl> <chr>    <dbl> <chr>       
1 Bacteria_Firmicutes    EPI    HC        46    28 0.82  ns       0.82  ns          
2 Archaea_Other EPI    HC        46    28 0.453 ns       0.453 ns

which I obtained by manually performing the analysis inserting the bacteria names. I tried to save the names in an array and substitute the single name (in the example, "Bacteria_Firmicutes") with something like names[i] but it doesn't work. Maybe that's a limit of this script, which only works with a specific name... or maybe I did something wrong? Or, is there another and maybe better way to obtain the output I want for this long dataset? Thank you!

These might help https://stackoverflow.com/a/52011158/786542, https://stackoverflow.com/a/49484854/786542 and https://stackoverflow.com/a/48908432/786542 — Tung, Jul 10 '20 at 22:43

score 0 · Accepted Answer · answered Jul 10 '20 at 22:59

You can try this (Archaea_Other is zero, so no output is produced). I hope this helped.

library(reshape2)
library(rstatix)
#Melt
Melted <- reshape2::melt(data,id.vars = 'label_Group')
#Stat test
pwc1 <- Melted %>% group_by(variable) %>% 
  pairwise_t_test(value ~ label_Group, p.adjust.method = "bonferroni")

# A tibble: 3 x 10
  variable              .y.   group1 group2    n1    n2     p p.signif p.adj p.adj.signif
* <fct>                 <chr> <chr>  <chr>  <int> <int> <dbl> <chr>    <dbl> <chr>       
1 Bacteria_Firmicutes   value EPI    HC         2     3 0.273 ns       0.273 ns          
2 Archaea_Euryarchaeota value EPI    HC         2     3 0.536 ns       0.536 ns          
3 Bacteria_Other        value EPI    HC         2     3 0.761 ns       0.761 ns

Data

data <- structure(list(label_Group = c("HC", "HC", "HC", "EPI", "EPI"
), Bacteria_Firmicutes = c(6.771703e-05, 3.362588e-05, 0, 0, 
0), Archaea_Other = c(0L, 0L, 0L, 0L, 0L), Archaea_Euryarchaeota = c(0, 
0.016835356, 0, 0.001121252, 0), Bacteria_Other = c(0.0009480385, 
5.604313e-05, 0.0002209945, 0.0002466755, 0.0003335038)), class = "data.frame", row.names = c(NA, 
-5L))

It works perfectly, thank you!! I didn't know Melt, what does it do exactly? Links together the names, so that they'll be seen as one? — camcecc10, Jul 11 '20 at 00:45
@camcecc10 Great! `melt` transform your data into long format so you can have all your variables listed in one column against other vars like `ids` and all values are also saved in another column. I hope this helped :) — Duck, Jul 11 '20 at 00:58

A good way for pairwise t-test for many rows? (R)

1 Answers1