scikit learn unwanted parallel processing

Question

I have a problem with nested multiprocessing witch starts when I use scikit-learn (v. 0.22) Quadratic Discriminant Analysis. Necessary is system configuration: 24 thread Xeon machine running fedora 30.

I run consecutively qda on the randomly selected subset of predictors:

def process(X,y,n_features,i=1):
    comb = np.random.choice(range(X.shape[1]),n_features,replace=False)
    qda = QDA(tol=1e-8)
    qda.fit(X[:,comb],y)
    y_pred = qda.predict(X[:,comb])
    return (accuracy_score(y,y_pred),comb,i)

where n_features is number of features randomly selected from the full set of possible predictors, X,y explanatory and depended variables.

When n_features is 18 or less process works in single-thread mode, which means that I can use any tool to parallel processing (I use ray). When n_features is 19, and above for unknown reason it (not me) starts all available threads, and entire calculation get more time even in comparison to a single thread.

tmp = [process(X,y,n_features,i=1) for _ in range(1000)]

Based on my previous experiences with other Linux libraries (R gstat precisely) the same situation (uncontrolled multithreading mode) was caused by Linux implementation of blas, but here it could not be the case. In general, the question is: what starts this multithreading and how to control it even if LDA/QDA hasn't n_jobs parameter to avoid nested multiprocessing.

score 1 · Answer 1 · answered Dec 18 '19 at 11:20

1

QDA in scikit-learn does not expose n_jobs meaning that you cannot set anything. However, it could be due to numpy which does not restrict the number of threads.

The solution to limit the number of threads are:

set the environment variable OMP_NUM_THREADS, MKL_NUM_THREADS, or OPENBLAS_NUM_THREADS to be sure that you will limit the number of threads;
you can use threadpoolctl which provides a context manager to set the number of threads.

answered Dec 18 '19 at 11:20

glemaitre

963
6
7

threadpoolctl works with ```threadpool_limits(limits=1, user_api='blas')``` This is possible best solution. os.environs['...'] - befeore numpy is loaded - no. – jarekj71 Dec 18 '19 at 11:36

scikit learn unwanted parallel processing

1 Answers1