I am looking at the joblib
examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I was doing:
N, _ = data.shape
upper_triangle = [(i, j) for i in range(N) for j in range(i + 1, N)]
dist_mat = np.zeros((N,N))
for (i, j) in upper_triangle:
dist_mat[i,j] = dist_fun(data[i], data[j])
dist_mat[j,i] = dist_mat[i,j]
where dist_fun
takes two vectors and computes a distance. How can I make this loop parallel since calls to dist_fun
can be made independent of each other.
EDIT: The distance function I am using is fastdtw
which is not so fast. So I think really do want to parallelize this. Using:
dist_mat = pdist(data, lambda x,y : fastdtw(x,y, dist=euclidean)[0])
I get an execution time of 58.1084 secs, and using:
dist_mat = np.zeros((N,N))
for (i,j), _ in np.ndenumerate(dist_mat):
dist_mat[i,j], _ = fastdtw(data[i,:], timeseries[j,:], dist=euclidean)
I get 116.36 seconds and using:
upper_triangle = [(i,j) for i in range(N) for j in range(i+1, N)]
dist_mat = np.zeros((N,N))
for (i,j) in upper_triangle:
dist_mat[i,j], _ = fastdtw(data[i,:], data[j,:], dist=euclidean)
dist_mat[j,i] = dist_mat[i,j]
I get 55.62 secs. Here N=33
. Does scipy
automatically makes use of all available cores?
EDIT: I think I have found a work around using the multiprocessing
package, but I will leave the question un-answered for the joblib folks to respond before I post what I think works.