14

My main problem is issued here. Since no one has given a solution yet, I have decided to find a workaround. I am looking for a way to limit a python scripts CPU usage (not priority but the number of CPU cores) with python code. I know I can do that with multiprocessing library (pool, etc.) but I am not the one who is running it with multiprocessing. So, I don't know how to that. And also I could do that via terminal but this script is being imported by another script. Unfortunately, I don't have the luxury of calling it through terminal.

tl;dr: How to limit CPU usage (number of cores) of a python script, which is being imported by another script and I don't even know why it runs in parallel, without running it via terminal. Please check the code snippet below.

The code snippet causing the issue:

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import numpy as np

X, _ = load_digits(return_X_y=True)

#Copy-paste and increase the size of the dataset to see the behavior at htop.
for _ in range(8):
    X = np.vstack((X, X))

print(X.shape)

transformer = IncrementalPCA(n_components=7, batch_size=200)

#PARTIAL FIT RUNS IN PARALLEL! GOD WHY?
---------------------------------------
transformer.partial_fit(X[:100, :])
---------------------------------------
X_transformed = transformer.fit_transform(X)

print(X_transformed.shape)

Versions:

  • Python 3.6
  • joblib 0.13.2
  • scikit-learn 0.20.2
  • numpy 1.16.2

UPDATE: Doesn't work. Thank you for clarification @Darkonaut . The sad thing is, I already knew this wouldn't work and I already clearly stated on the question title but people don't read I guess. I guess I am doing it wrong. I've updated the code snippet based on the @Ben Chaliah Ayoub answer. Nothing seems to be changed. And also I want to point out to something: I am not trying to run this code on multiple cores. This line transformer.partial_fit(X[:100, :]) running on multiple cores (for some reason) and it doesn't have n_jobs or anything. Also please note that my first example and my original code is not initialized with a pool or something similar. I can't set the number of cores in the first place (Because there is no such place). But now there is a place for it but it is still running on multiple cores. Feel free to test it yourself. (Code below) That's why I am looking for a workaround.

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import numpy as np
from multiprocessing import Pool, cpu_count
def run_this():
    X, _ = load_digits(return_X_y=True)
    #Copy-paste and increase the size of the dataset to see the behavior at htop.
    for _ in range(8):
        X = np.vstack((X, X))
    print(X.shape)
    #This is the exact same example taken from sckitlearn's IncrementalPCA website.
    transformer = IncrementalPCA(n_components=7, batch_size=200)
    transformer.partial_fit(X[:100, :])
    X_transformed = transformer.fit_transform(X)
    print(X_transformed.shape)
pool= Pool(processes=1)
pool.apply(run_this)

UPDATE: So, I have tried to set blas threads using this in my code before importing numpy but it didn't work (again). Any other suggestions? The latest stage of code can be found below.

Credits: @Amir

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
import os
os.environ["OMP_NUM_THREADS"] = "1" # export OMP_NUM_THREADS=1
os.environ["OPENBLAS_NUM_THREADS"] = "1" # export OPENBLAS_NUM_THREADS=1
os.environ["MKL_NUM_THREADS"] = "1" # export MKL_NUM_THREADS=1
os.environ["VECLIB_MAXIMUM_THREADS"] = "1" # export VECLIB_MAXIMUM_THREADS=1
os.environ["NUMEXPR_NUM_THREADS"] = "1" # export NUMEXPR_NUM_THREADS=1

import numpy as np

X, _ = load_digits(return_X_y=True)

#Copy-paste and increase the size of the dataset to see the behavior at htop.
for _ in range(8):
    X = np.vstack((X, X))

print(X.shape)
transformer = IncrementalPCA(n_components=7, batch_size=200)

transformer.partial_fit(X[:100, :])

X_transformed = transformer.fit_transform(X)

print(X_transformed.shape)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
MehmedB
  • 1,059
  • 1
  • 16
  • 42
  • Can't you solve this running the program with `nice` instead? I gather that is what you really want? – JohanL Apr 18 '19 at 22:50
  • isn't `nice` about changing the priority of a job? I am looking for a way to change the number of CPU cores. – MehmedB Apr 19 '19 at 06:51
  • It is but why do you care about the number of cores, really? Aren't you more interested in making your server responsive and available to different users? – JohanL Apr 19 '19 at 06:56
  • Different teams are testing different multi-processed algorithms. It is important to not to mess with a team's core or something like that. Tbh. I'm not exactly sure why. This is what my supervisor told me. – MehmedB Apr 19 '19 at 06:59
  • OK, then perhaps search for CPU affinity and see what that can do for you. That helps you limit a program to a set of CPU cores. However, it will force you to define what cores up front and won't allow for just any *n* cores but rahter e.g. core 2, 4, and 7... – JohanL Apr 19 '19 at 07:02
  • 2
    Pool won't help you here. `sklearn` builds on `numpy`, which does [GIL](https://wiki.python.org/moin/GlobalInterpreterLock)-releasing multithreading through its math-libraries. Pool's `processes`-parameter only controls how many worker-processes are created, it doesn't enforce core-usage limits for whatever your function does. What you need to limit is the number of threads `numpy` uses. – Darkonaut Apr 19 '19 at 09:47
  • 1
    In case your numpy-build uses [MKL](https://en.wikipedia.org/wiki/Math_Kernel_Library) under the hood, your solution can be as easy as adding `import mkl; mkl.set_num_threads(1)`. Find more answers [here](https://stackoverflow.com/q/17053671/9059420) and [here](https://stackoverflow.com/q/19257070/9059420). – Darkonaut Apr 19 '19 at 09:47
  • @Darkonaut, is this a bug in numpy? Mine is using OpenBLAS under the hood. I'm trying to limit the number of threads but my original code is a bit complicated. I am still working on it. If you could show me an example with the code snippet that I provided that would be great. – MehmedB Apr 25 '19 at 06:58
  • I've just realized it is not a bug but a feature of OPENBLAS. – MehmedB Apr 25 '19 at 07:22
  • 1
    It's not a bug, rather a missing feature. For the usual usecase you would be happy to overcome the limitations of the GIL and have numpy multithreaded with true parallelism for free. But that numpy doesn't expose a high-level API to switch multithreading on/off is indeed unfortunate (see discussion on [github](https://github.com/numpy/numpy/issues/11826)). Might help for [OpenBlas](https://stackoverflow.com/q/22813923/9059420). – Darkonaut Apr 25 '19 at 07:24

2 Answers2

24

I am looking for a way to limit a python scripts CPU usage (not priority but the number of CPU cores) with python code.

Run you application with taskset or numactl.

For example, to make your application utilize only the first 4 CPUs do:

taskset --cpu-list 0-3 <app>

These tools, however, limit the process to use specific CPUs, not the total number of used CPUs. For best results they require those CPUs to be isolated from the OS process scheduler, so that the scheduler doesn't run any other processes on those CPUs. Otherwise, if the specified CPUs are currently running other threads, while other CPUs are idle, your threads won't be able to run on other idle CPUs and will have to queue up for these specific CPUs, which isn't ideal.

Using cgroups you can limit your processes/threads to use a specific fraction of available CPU resources without limiting to specific CPUs, but cgroups setup is less trivial.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
6

I solved the problem in the example code given in the original question by setting BLAS environmental variables (from this link). My first try (second update) was wrong. I needed to set the number of threads not before importing the numpy library but before the library (IncrementalPCA) importing the numpy.
So, what was the problem in the example code? It wasn't an actual problem but a feature of BLAS library used by numpy library. Trying to limit it with multiprocessing library didn't work because by default OpenBLAS is set to use all available threads.
Credits: @Amir and @Darkonaut Sources: OpenBLAS 1, OpenBLAS 2, Solution

import os
os.environ["OMP_NUM_THREADS"] = "1" # export OMP_NUM_THREADS=1
os.environ["OPENBLAS_NUM_THREADS"] = "1" # export OPENBLAS_NUM_THREADS=1
os.environ["MKL_NUM_THREADS"] = "1" # export MKL_NUM_THREADS=1
os.environ["VECLIB_MAXIMUM_THREADS"] = "1" # export VECLIB_MAXIMUM_THREADS=1
os.environ["NUMEXPR_NUM_THREADS"] = "1" # export NUMEXPR_NUM_THREADS=1
from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA


import numpy as np

X, _ = load_digits(return_X_y=True)

#Copy-paste and increase the size of the dataset to see the behavior at htop.
for _ in range(8):
    X = np.vstack((X, X))

print(X.shape)
transformer = IncrementalPCA(n_components=7, batch_size=200)

transformer.partial_fit(X[:100, :])

X_transformed = transformer.fit_transform(X)

print(X_transformed.shape)

But you can explicitly set the correct BLAS environment by checking which one is used by your numpy build like this:

>>>import numpy as np
>>>np.__config__.show()

Gave these results...

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

...meaning OpenBLAS is used by my numpy build. And all I need to write is os.environ["OPENBLAS_NUM_THREADS"] = "2" in order to limit thread usage by the numpy library.

MehmedB
  • 1,059
  • 1
  • 16
  • 42