Fastest way to compute matrix dot product

Question

I compute the dot product as follows:

import numpy as np
A = np.random.randn(80000, 3000)
B = np.random.randn(3000, 50)
C = np.dot(A, B)

Running this script takes about 9 seconds:

Mac@MacBook-Pro:~/python_dot_product$ time python dot.py 

real    0m9.042s
user    0m10.927s
sys     0m0.911s

Could I do any better? Does numpy already use the ideal balance for the cores?

That should be the fastest way. If you're looking to benchmark another solution, try using [the `@` operator](https://www.python.org/dev/peps/pep-0465/) — inspectorG4dget, May 08 '17 at 20:50
Use `np.__config__.show()` to investigate what library is it using for matrix-multiplication. Another way would be to open up the sys monitor and have a visual check. — Divakar, May 08 '17 at 20:51
Other than changing the BLAS backend to `numpy`, you probably aren't going to get faster than this. — juanpa.arrivillaga, May 08 '17 at 20:52
when I rerun your code it takes me 14 seconds, and 13 seconds is just the creating `A` and `B`. so as you test alternatives, keep in mind to track only the time of the function operation (np.dot in this case). this should provide more relevant comparisons. — Max Power, May 08 '17 at 21:25
I thought I'd answer this using `multiprocessing`'s `pool.map` on `np.dot` but that took me 6x as long. Looking some more, the second and third answers (but not the first/accepted one) at the link below should be helpful. http://stackoverflow.com/questions/11442191/parallelizing-a-numpy-vector-operation — Max Power, May 08 '17 at 21:27

score 3 · Accepted Answer · edited May 23 '17 at 10:31

The last two answers at this SO answer should be helpful.

The last one pointed me to SciPy documentation, which includes this quote:

"[np.dot(A,B) is evaluated using BLAS, which] will normally be a library carefully tuned to run as fast as possible on your hardware by taking advantage of cache memory and assembler implementation. But many architectures now have a BLAS that also takes advantage of a multicore machine. If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything."

So it sounds like it depends on your specific hardware and SciPy compilation. Sometimes np.dot(A,B) will utilize your multiple cores/processors, sometimes it might not.

To find out which case is yours, I suggest running your toy example (with larger matrices) while you have your system monitor open, so you can see whether just one CPU spikes in activity, or if multiple ones do.

Fastest way to compute matrix dot product

1 Answers1