multiplication of large arrays in python

Question

I have big arrays to multiply in large number of iterations also.

I am training a model with array long around 1500 and I will perform 3 multiplications for about 1000000 times which takes a long time almost week.

I found Dask I tried to compare it with the normal numpy way but I found numpy faster:

x = np.arange(2000)

start = time.time()
y = da.from_array(x, chunks=(100))

for i in range (0,100):
    p = y.dot(y)

#print(p)
print( time.time() - start)

print('------------------------------')

start = time.time()

p = 0

for i in range (0,100):
    p = np.dot(x,x)

print(time.time() - start)

0.08502793312072754

0.00015974044799804688

Am I using dask wrong or it is numpy that fast ?

score 2 · Accepted Answer · answered Jun 23 '16 at 20:05

Performance for .dot strongly depends on the BLAS library to which your NumPy implementation is linked.

If you have a modern implementation like OpenBLAS or MKL then NumPy is already running at full speed using all of your cores. In this case dask.array will likely only get in the way, trying to add further parallelism when none is warranted, causing thread contention.

If you have installed NumPy through Anaconda then you likely already have OpenBLAS or MKL, so I would just be happy with the performance that you have and call it a day.

However, in your actual example you're using chunks that are far too small (chunks=(100,)). The dask task scheduler incurs about a millisecond of overhead per task. You should choose a chunksize so that each task takes somewhere in the 100s of milliseconds in order to hide this overhead. Generally a good rule of thumb is to aim for chunks that are above a megabyte in size. This is what is causing the large difference in performance that you're seeing.

I increased the chunk and the best is 0.054 which still far from Numpy. I think as you said, Numpy is parallelized already. Thanks for your detailed explanation, it is more clear now — Obadah Meslmani, Jun 23 '16 at 21:24
Yeah, for small fast problems the overhead of parallel computing frameworks usually gets in the way more than it helps. — MRocklin, Jun 23 '16 at 23:38

multiplication of large arrays in python

1 Answers1

Linked