I have an numpy array of size
arr.size = (200, 600, 20).
I want to compute scipy.stats.kendalltau
on every pairwise combination of the last two dimensions. For example:
kendalltau(arr[:, 0, 0], arr[:, 1, 0])
kendalltau(arr[:, 0, 0], arr[:, 1, 1])
kendalltau(arr[:, 0, 0], arr[:, 1, 2])
...
kendalltau(arr[:, 0, 0], arr[:, 2, 0])
kendalltau(arr[:, 0, 0], arr[:, 2, 1])
kendalltau(arr[:, 0, 0], arr[:, 2, 2])
...
...
kendalltau(arr[:, 598, 20], arr[:, 599, 20])
such that I cover all combinations of arr[:, i, xi]
with arr[:, j, xj]
with i < j
and xi in [0,20)
, xj in [0, 20)
. This is (600 choose 2) * 400
individual calculations, but since each takes about 0.002 s
on my machine, it shouldn't take much longer than a day with the multiprocessing module.
What's the best way to go about iterating over these columns (with i<j
)? I figure I should avoid something like
for i in range(600):
for j in range(i+1, 600):
for xi in range(20):
for xj in range(20):
What is the most numpythonic way of doing this?
Edit: I changed the title since Kendall Tau isn't really important to the question. I realize I could also do something like
import itertools as it
for i, j in it.combinations(xrange(600), 2):
for xi, xj in product(xrange(20), xrange(20)):
but there's got to be a better, more vectorized way with numpy.