I have a (40000,40000) array A and vector v of size 40000. When i run
u = A.dot(b)
only one core is used. Is there any standard way to make it run in parallel?
Using the Anaconda distribution on RedHat. I have seen a lot of questions/answers about BLAS/PBLAS/ATLAS/OpenBLAS but i cannot find my way around it.