Unexpected behavior for timing code in JupyterLab

Question

I have a project in JupyterLab that involves minimzing an objective with SciPy which performs a lot of matrix multiplication. I time this using %%time which prints both the CPU and wall time, which are usually similar but recently (i.e. two weeks ago) I noticed that wall time was about half the CPU time consistently across different optimization parameter settings. The timing results of a particular cell are below

CPU times: total: 6min 4s Wall time: 3min 18s

but this 2x speedup was consistent across multiple runs (and much longer runs, such as 1.5 hours of CPU time for ~45 mins of wall time, so this isn't due to random runtime fluctuations). I don't use (or even import) multiprocessing in my code, nor was I aware of any built-in multiprocessing functionality in Jupyter. If I do explicitly use multiprocessing, my code should get a 2x speedup because my laptop has one additional core it could use, but I have no idea how this might be happening automatically. My question is general: is some kind of multiprocessing built into JupyterLab and/or NumPy under the hood, such as a default number of available cores?

For version details, running !jupyter --version gives

Selected Jupyter core packages...
IPython          : 8.12.0
ipykernel        : 6.19.2
ipywidgets       : not installed
jupyter_client   : 8.1.0
jupyter_core     : 5.3.0
jupyter_server   : 1.23.4
jupyterlab       : 3.5.3
nbclient         : 0.5.13
nbconvert        : 6.5.4
nbformat         : 5.7.0
notebook         : 6.5.4
qtconsole        : not installed
traitlets        : 5.7.1

and also I am using NumPy version 1.23.5 and SciPy 1.10.0.

SciPy can use BLAS under the hood, which can use multi-threaded computation even in places where you don't explictly ask for it. This depends on the specific BLAS or LINPACK libraries in use. See [here](https://stackoverflow.com/questions/35101312/multi-threaded-integer-matrix-multiplication-in-numpy-scipy) for an example of matrix multiplies using multiple cores. — Nick ODell, May 04 '23 at 01:45
See also [threadpoolctl](https://pypi.org/project/threadpoolctl/), which can be used to control the level of parallelism from BLAS. — Nick ODell, May 04 '23 at 01:46

score 0 · Answer 1 · answered May 04 '23 at 02:26

0

Turns out that as the commenter above said, certain BLAS operations are multithreaded by default, like np.dot (see here and here).

answered May 04 '23 at 02:26

chrysaor4

117
4

Unexpected behavior for timing code in JupyterLab

1 Answers1