1

This is my first post here.

I have 4 dataframes for which I would like to do stepwise nonparametric tests for each row.

enter image description here

Eg. I would like to compare the values for each row in dataframe A with the values for each row in dataframe B.

I would need a non parametric test eg. Wilcoxon or whatever.

I thought of making a new column with the median, but I am certain that there is something better.

Could you give me an idea how to do this?

Thank you in advance!

Edit: Here are my imaginary dataframes.

I want to compare each dataframe row-wise eg do a nonparametric test for John in dataframes A and B, then for Dora, etc.

A <- data.frame("A" = c("John","Dora","Robert","Jim"), 
                "A1" = c(8,1,10,5), 
                "A2"= c(9,1,1,4))
B <- data.frame("B" = c("John","Dora","Robert","Jim"), 
                "B1" = c(1,1,1,5), 
                "B2"= c(3,2,1,5), 
                "B3"=c(4,3,1,5), 
                "B4"=c(6,8,8,1))
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Welcome to SO! Instead of pasting an image of your data, can you provide an example fo your data in plain text ? It will be much easier for people to copy/paste it. See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – dc37 Jan 22 '20 at 14:12
  • Thanks for adding your imaginary dataframes. I edited my answer to use those. – dc37 Jan 22 '20 at 15:19

1 Answers1

0

I think you are looking for the function wilcox.test (in stats package).

Solution 1: Using a for loop

One way to compare each row of A with the corresponding row of B (and extract the p value) is to create a for loop such as this:

pval = NULL
for(i in 1:nrow(A))
{
    vec_a = as.numeric(A[i,2:ncol(A)])
    vec_b = as.numeric(B[B$B == A$A[i],2:ncol(B)])

    p <- wilcox.test(vec_a,vec_b)
    pval = c(pval, p$p.value)
    print(p)
}

At the end, you will get a vector pval containing the pvalue for each row.

pval
[1] 0.1333333 0.2188194 0.5838824 1.0000000

Solution 2: Using tidyverse

A more elegant solution is to have the use of the tidyverse packages (in particular dplyr and tidyr) to assemble your dataframe into a single one, and compare each name by group by passing a formula in the function wilcox.test.

First, we can merge your dataframes by their name using left_join function from dplyr:

library(dplyr)
DF <- left_join(A,B, by = c("A"="B"))

       A A1 A2 B1 B2 B3 B4
1   John  8  9  1  3  4  6
2   Dora  1  1  1  2  3  8
3 Robert 10  1  1  1  1  8
4    Jim  5  4  5  5  5  1

Then using dplyr and tidyr packages, you can reshape your dataframe into a longer format:

library(dplyr)
library(tidyr)
DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") 

# A tibble: 24 x 3
   A     var   values
   <fct> <chr>  <dbl>
 1 John  A1         8
 2 John  A2         9
 3 John  B1         1
 4 John  B2         3
 5 John  B3         4
 6 John  B4         6
 7 Dora  A1         1
 8 Dora  A2         1
 9 Dora  B1         1
10 Dora  B2         2
# … with 14 more rows

We will create a new column "group" that will indicate A or B depending of values in the column var:

DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
  mutate(group = gsub("\\d","",var))

# A tibble: 24 x 4
   A     var   values group
   <fct> <chr>  <dbl> <chr>
 1 John  A1         8 A    
 2 John  A2         9 A    
 3 John  B1         1 B    
 4 John  B2         3 B    
 5 John  B3         4 B    
 6 John  B4         6 B    
 7 Dora  A1         1 A    
 8 Dora  A2         1 A    
 9 Dora  B1         1 B    
10 Dora  B2         2 B    
# … with 14 more rows

Finally, we can group by A and summarise the dataframe to get the p value of the function wilcox.test when comparing values in each group for each name:

DF %>% pivot_longer(., -A, names_to = "var", values_to = "values") %>%
  mutate(group = gsub("\\d","",var)) %>%
  group_by(A) %>%
  summarise(Pval = wilcox.test(values~group)$p.value)

# A tibble: 4 x 2
  A       Pval
  <fct>  <dbl>
1 Dora   0.219
2 Jim    1    
3 John   0.133
4 Robert 0.584

It looks longer (especially because I explain each steps) but at the end, you can see that we need fewer lines than the first solution.

Does it answer your question ?

dc37
  • 15,840
  • 4
  • 15
  • 32