I think you are looking for the function wilcox.test
(in stats
package).
Solution 1: Using a for loop
One way to compare each row of A with the corresponding row of B (and extract the p value) is to create a for loop
such as this:
pval = NULL
for(i in 1:nrow(A))
{
vec_a = as.numeric(A[i,2:ncol(A)])
vec_b = as.numeric(B[B$B == A$A[i],2:ncol(B)])
p <- wilcox.test(vec_a,vec_b)
pval = c(pval, p$p.value)
print(p)
}
At the end, you will get a vector pval
containing the pvalue for each row.
pval
[1] 0.1333333 0.2188194 0.5838824 1.0000000
Solution 2: Using tidyverse
A more elegant solution is to have the use of the tidyverse
packages (in particular dplyr
and tidyr
) to assemble your dataframe into a single one, and compare each name by group by passing a formula in the function wilcox.test
.
First, we can merge your dataframes by their name using left_join
function from dplyr
:
library(dplyr)
DF <- left_join(A,B, by = c("A"="B"))
A A1 A2 B1 B2 B3 B4
1 John 8 9 1 3 4 6
2 Dora 1 1 1 2 3 8
3 Robert 10 1 1 1 1 8
4 Jim 5 4 5 5 5 1
Then using dplyr
and tidyr
packages, you can reshape your dataframe into a longer format:
library(dplyr)
library(tidyr)
DF %>% pivot_longer(., -A, names_to = "var", values_to = "values")
# A tibble: 24 x 3
A var values
<fct> <chr> <dbl>
1 John A1 8
2 John A2 9
3 John B1 1
4 John B2 3
5 John B3 4
6 John B4 6
7 Dora A1 1
8 Dora A2 1
9 Dora B1 1
10 Dora B2 2
# … with 14 more rows
We will create a new column "group" that will indicate A or B depending of values in the column var:
DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
mutate(group = gsub("\\d","",var))
# A tibble: 24 x 4
A var values group
<fct> <chr> <dbl> <chr>
1 John A1 8 A
2 John A2 9 A
3 John B1 1 B
4 John B2 3 B
5 John B3 4 B
6 John B4 6 B
7 Dora A1 1 A
8 Dora A2 1 A
9 Dora B1 1 B
10 Dora B2 2 B
# … with 14 more rows
Finally, we can group by A
and summarise the dataframe to get the p value of the function wilcox.test
when comparing values in each group for each name:
DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
mutate(group = gsub("\\d","",var)) %>%
group_by(A) %>%
summarise(Pval = wilcox.test(values~group)$p.value)
# A tibble: 4 x 2
A Pval
<fct> <dbl>
1 Dora 0.219
2 Jim 1
3 John 0.133
4 Robert 0.584
It looks longer (especially because I explain each steps) but at the end, you can see that we need fewer lines than the first solution.
Does it answer your question ?