0

I have a folder with about 100 csv files. I want to use a two sampled Kolmogorov-Smirnov test on every possible file combination. I can do this manually like this:

import pandas as pd 
import scipy as sp

df=pd.read_csv(r'file1.csv')
df2=pd.read_csv(r'file2.csv')
sp.stats.ks_2samp(df, df2)

but I don't want to manually assign all the variables. Is there a way to iterate through the files and compare all the possible combinations using the statistical test?

Stefano Potter
  • 3,467
  • 10
  • 45
  • 82

1 Answers1

3

Sounds like you want to get the cartesian product of a list of filenames with itself.

Cartesian product of lists in python

In your implementation, you should have a list of all the filenames in a list, and then call

itertools.product(files, files)

In the documentation for itertools.product it mentions that it is the same as

((x,y) for x in A for y in B)
Community
  • 1
  • 1
Cameron Aavik
  • 812
  • 7
  • 21