I need to diagonalize complex (hermitean) matrices of dimensions > 2000 using numpy.linalg.eigvalsh. On one computer, top shows that numpy is multithreading, while in the other it shows a single thread. Both computers have essentially identical OS's (Arch Linux, python 3.10). The output of numpy.show_config() is absolutely identical in both machines. The machine in which I see multithreading is a laptop with 16GB RAM and i7-8550U @ 1.80GHz CPU (4 physical cores). The one in which I don´t see it has an Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz CPU (48 physical cores) and 180 BG RAM. Is this behaviour expected? What am I missing? Thanks!
Asked
Active
Viewed 99 times
1
-
What BLAS did you use? Can you print the result of `numpy.show_config()`? Can you check which dynamic libraries are loaded (see [this post](https://stackoverflow.com/questions/5103443/how-to-check-what-shared-libraries-are-loaded-at-run-time-for-a-given-process)) and report the list? – Jérôme Richard May 27 '22 at 13:26
-
1I only have openblas installed in both machines. As far as I could see, the dynamic libraries being loaded are the same in both machines. Here are the lists: [https://drive.google.com/file/d/1qrm80LKfPA4cuZVEouaia1NMAJp7NQgV/view?usp=sharing] [https://drive.google.com/file/d/1vVDiec8kestL7vYpyXpH0ny2EqhU5g6n/view?usp=sharing] – Antonio Costa May 27 '22 at 13:59
-
1And the outputs of numpy.show_config() https://drive.google.com/file/d/1ocKlWPQQdVOo0AlQ0C10Xk8_JQDcgGcd/view?usp=sharing https://drive.google.com/file/d/1yE8WvJHTN73FZem2N5ruG9N0RSSLM_96/view?usp=sharing Thanks! – Antonio Costa May 27 '22 at 14:06
-
Indeed, interesting. Can you try with much bigger matrices on both machines. Something like 3x~4x bigger. An the same thing for smaller matrices for both. I might be due to threshold in the OpenBLAS code. You should also check if the file `/usr/lib/libcblas.so.3.10.1` is a symlink (certainly) and check if it point to the same version of OpenBLAS in the end. – Jérôme Richard May 27 '22 at 15:48
-
1I have just tried a 14400x14400 matrix, same behaviour. /usr/lib/libcblas.so.3.10.1 is not a symlink in either system. Should it be? – Antonio Costa May 27 '22 at 21:07
-
1Ok. Thank you for these useful information. libcblas can be a symlink to OpenBLAS (or other BLAS) on some systems but it can also be a library that is dependent to other libraries (certainly your case). OpenBLAS is linked in both case and the order is similar so I expect OpenBLAS to be used in both cases (the version look the same too). It may be an issue with your environment or a bug in OpenBLAS. Can you report `os.environ` on both machines? – Jérôme Richard May 27 '22 at 21:27
-
It definitely looks like something in the environment: I have been running the code in the Xeon machine through slurm. Just by accident, today I logged directly into the compute node and now top shows all available CPU's being used. As soon as I nail down which component of the environment is the culprit I will let you know. Thanks a lot!! – Antonio Costa May 30 '22 at 07:44
-
Haa you use Sulm! AFAIK it can control the binding of the thread regarding the script used to schedule the job (I remember there was some bug in SLURM related to that on some machine HPC machine btw). The best to do is to manually bind the threads if possible. For OpenBLAS it should use OpenMP internally which can be controlled using `OMP_PROC_BIND` (set to `TRUE`) and `OMP_PLACES`. You can also check the OpenMP config with `OMP_DISPLAY_ENV` (set to `TRUE` or `VERBOSE`). For the Intel runtime, you can also try `KMP_AFFINITY=verbose` – Jérôme Richard May 30 '22 at 09:17