How to compute outer product of two lists of arrays efficiently in some parallel fashion?

Question

I have two lists of vectors.

A = np.random.rand(100,2000)
B = np.random.rand(100,1000)

I need to calculate the outer product of the first entry of A with the first entry of B. Then the second, then the third and so on.

A naive loop

outers = []
for a, b in zip(A,B):
    outers.append(np.outer(a,b))

takes ≈ 730 [ms] (via &&timeit) on my computer.

In the end outers is a 100 entry long list of 2000x1000 arrays, which is correct.

There must be a more efficient way of parallelising this task as now we actually first compute A[0] with B[0] and THEN A[1] B[1], where we could do them all independently and parallel.

Have you check this [Question](https://stackoverflow.com/questions/27809511/efficient-outer-product-in-python) — Sohaib Farooqi, Dec 16 '17 at 16:01
@Divakar None of the methods are faster than 700ms. Broadcasting is 750ms, einsum is 900ms — Swift, Dec 16 '17 at 16:59

score 0 · Answer 1 · answered Dec 16 '17 at 23:40

If you want to do NumPy array operations in parallel, Dask is an excellent choice. For example, you can do this operation as follows:

import dask.array as da
dA = da.from_array(A, chunks=(10, A.shape[1]))
dB = da.from_array(B, chunks=(10, B.shape[1]))

task_graph = dA[:, :, None] * dB[:, None]

result = task_graph.compute()

The compute() step is flexible enough to apply the computation on multiple threads, multiple processes, multiple cores, multiple machines, etc.

For the particular example in your question, you're not going to gain much over a serial approach, as the overhead involved in chunking the input arrays and concatenating the output array is significant compared to the cost of simply doing 100 outer products. For larger problems, though, such an approach can lead to significant speedups.

How to compute outer product of two lists of arrays efficiently in some parallel fashion?

1 Answers1