I have a machine with 24 physical cores (at least I was told so) running Debian: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux
. It seems to be correct:
usr@machine:~/$ cat /proc/cpuinfo | grep processor
processor : 0
processor : 1
<...>
processor : 22
processor : 23
I had some issues trying to load all cores with Python's multiprocessing.pool.Pool
. I used Pool(processes=None)
; the docs say that Python uses cpu_count()
if None
is provided.
Alas, only 8 cores were 100% loaded, others remained idle (I used htop
to monitor CPU load). I thought that I cannot cook Pools
properly and tried to invoke 24 processes "manually":
print 'Starting processes...'
procs = list()
for param_set in all_params: # 24 items
p = Process(target=_wrap_test, args=[param_set])
p.start()
procs.append(p)
print 'Now waiting for them.'
for p in procs:
p.join()
I had 24 "greeting" messages from the processes I started:
Starting processes...
Executing combination: Session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 5, alpha: 0.01, reduce: 500
< ... 22 more messages ... >
Executing combination: Session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 7, alpha: 0.01, reduce: 2000
Now waiting for them.
But still only 8 cores were loaded:
I've read here on SO that there may be issues with numpy
, OpenBLAS and multicore execution. This is how I start my code:
OPENBLAS_MAIN_FREE=1 python -m tests.my_module
And after all imports I do:
os.system("taskset -p 0xff %d" % os.getpid())
So, here is the question: what should I do to have 100%-load on all cores? Is this just my poor Python usage or it has something to do with OS limitations on multicore machines?
UPDATED: one more interesting thing is some inconsistency within htop
output. If you look at the image above, you'll see that the table below the CPU load bars shows 30-50% load for much more than 8 cores, which is definitely different from what load bars say. Then, top
seems to agree with those bars: 8 cores 100%-loaded, others idle.
UPDATED ONCE AGAIN:
I used this rather popular post on SO when I added the os.system("taskset -p 0xff %d" % os.getpid())
line after all imports. I have to admit that I didn't think too much when I did that, especially after reading this:
With this line pasted in after the module imports, my example now runs on all cores
I'm a simple man. I see "works like a charm", I copy and paste. Anyway, while playing with my code I eventually removed this line. After that my code began executing on all 24 cores for the "manual" Process
starting scenario. For the Pool
scenario the same problem remained, no matter whether the affinity trick was used or not.
I don't think it's a real answer 'cause I don't know what the issue is with Pool
, but at least I managed to get all cores fully loaded. Thank you!