0

I am using numpy.linalg.eigvals() to obtain all the eigenvalues in a matrix. The matrices are large, at least 15000*15000 filled with complex128 variables.

I have access to a cluster where I can request many cpus (cores) to run the computation, so I would like to know

  1. Does numpyp.linalg.eigvals() automatically use all the cores available in a given computer node to speed up the diagonalisation process?
  2. If it does not do it automatically, is there a way to specify how many cores it should use.

Please note that my code uses Numba. I am open to using the scipy.linalg library if it is better, but the numpy.linalg library is easier with Numba.

Thank you.

Geoff
  • 1
  • tl;dr YES. Does this answer your question? [Limit number of threads in numpy](https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) – Aaron Jan 21 '22 at 15:35
  • 1
    Can you clarify? You talk about cores available, but then you talk about nodes on a cluster. It's not clear whether you are asking about MPI parallelism across cores, or threaded parallelism on a single core. Generally numpy can use threads, but does not transparently use MPI to increase performance. – tgpz Jan 21 '22 at 15:48
  • 3
    @tgpz complex128 does not use 128-bit precision. It use 2x64-bit values (imaginary + real value). Thus, this is fine here. – Jérôme Richard Jan 21 '22 at 16:00
  • You should think of numpy and scipy's linalg modules as wrappers for openblas/mkl/eigen, depending on what they're linked to. The parallelization will depend on the linked libraries. – CJR Jan 21 '22 at 16:04
  • @tgpz Thanks. Let me try to clarify. I can specify the number of CPU cores needed to run my job on the cluster. I was wondering if only changing the number of cpu cores (say 8 cpu cores instead of 1 cpu core) would improve the performance of np.linalg.eigvals. I think I am therefore asking about multiprocessing, but I am quite new to all of this so my understanding of multithreading vs multiprocessing is shaky. – Geoff Jan 21 '22 at 16:33
  • If you use an HPC cluster with a job scheduler like SLURM, then AFAIK a full CPU processor is reserved (often even a node). The pinning of threads may change though. If the underlying library is configured to use more core than reserved, the performances will be very bad. The opposite is not so true regarding the library (AFAIK most does not scale for this). We need more information to help you like which BLAS/Lapack library do you use (a critical point). Alternatively, you can check with `top` if multiple cores are used when you reserve more threads to the job scheduler and report this info. – Jérôme Richard Jan 21 '22 at 17:37

0 Answers0