My question is about the following code:
%%time
import numpy as np
n_elems = 95
n_repeats = 100000
for i in range(n_repeats):
X = np.random.rand(n_elems, n_elems)
y = np.random.rand(n_elems)
_ = X.dot(y)
I run this in iPython (version 6.2.1
) with Python 3.5.5
and numpy version 1.14.0
on an 8-core machine.
I get the following output:
CPU times: user 8.93 s, sys: 439 ms, total: 9.37 s
Wall time: 8.79 s
When n_elems
is set between 1
and 95
, the CPU and wall time are roughly equivalent. In addition, the CPU usage of the process (as seen using top
) only goes up to 100%
.
However, when n_elems
is set to 96
, I get the following:
CPU times: user 39.4 s, sys: 1min 28s, total: 2min 8s
Wall time: 16.2 s
There is now a noticeable difference between the CPU and wall time. Also, the CPU usage reaches close to 800%
.
Similar behaviour is observed for larger values of n_elems
.
I think this is because at a certain array size the numpy operation becomes multithreaded.
Could someone clarify this?
Also is there a way to restrict CPU usage of the process to 100%
.