Wilcoxon test with two pandas dataframe of different sizes

Question

I have two dataframe of different length (one is 16 and the other 28). I want to do a Wilcoxon test between those two using scipy.stats.wilcoxon. For this I have created a function:

def wilcoxon_test(df1, df2):

  list_col_1 = df1.columns
  list_col_2 = df2.columns

  for i in range(0, len(list_col_1)):
    name = list_col_1[i]
    for j in range(0, len(list_col_2)):
      name_check = list_col_2[j]
      if name_check == name:
        stat, pvalue = stats.wilcoxon(df1[name], df2[name_check])
        print("Wilcoxon test of {} and {}: stat = {}, pvalue = {}".format(name,name_check,stat,pvalue))
        if pvalue < 0.01:
          print("Pvalue between {} and {} < 0.01".format(name,name_check))

  return None

It works well when data have the same size, but I am working with DataFrames of different size, and it gives me this error: ValueError: The samples x and y must have the same length.

I've seen on this post discussing this issue on R, that you can do it by passing paired: FALSE. By doing this, it's equivalent to doing a Mann-Whitney test.

It's there a way to do the same on Python with scipy.stats.wilocoxon or should I directly use scipy.stats.mannwhitneyu ?

Thanks

score 1 · Answer 1 · answered Nov 22 '21 at 10:02

if you want a non paired wilcoxon test, mannwhitneyu seems to be the right choice. In scipy documentation on mannwhitneyu you can find the following description : mannwhitneyu is for independent samples. For related / paired samples, consider scipy.stats.wilcoxon.

Wilcoxon test with two pandas dataframe of different sizes

1 Answers1