3

There are no n_jobs parameter for GaussianMixture. Meanwhile, whenever I fit the model

from sklearn.mixture import GaussianMixture as GMM
gmm = GMM(n_components=4,
          init_params='random',
          covariance_type='full',
          tol=1e-2,
          max_iter=100,
          n_init=1)
gmm.fit(X, y)

it spans 16 processes and uses full CPU power of my 16 CPUs machine. I do not want for it to be doing that.

In comparison, Kmeans has n_jobs parameter that controls mutliprocessing when having multiple initializations (n_init > 1). Here multiprocessing comes out of the blue.

My question is where its coming from and how to control it?

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
mr.tarsa
  • 6,386
  • 3
  • 25
  • 42

1 Answers1

3

You are observing parallel-processing in terms of basic algebraic operations, speed up by BLAS/LAPACK.

Modifying this is not as simple as setting a n_jobs parameter and depends on your implementation in use!

Common candidates are ATLAS, OpenBLAS and Intel's MKL.

I recommend checking which one is used first, then act accordingly:

import numpy as np
np.__config__.show()

Sadly these things can get tricky. A valid environment for MKL for example can look like this (source):

export MKL_NUM_THREADS="2"
export MKL_DOMAIN_NUM_THREADS="MKL_BLAS=2"
export OMP_NUM_THREADS="1"
export MKL_DYNAMIC="FALSE"
export OMP_DYNAMIC="FALSE"

For ATLAS, it seems, you define this at compile-time.

And according to this answer, the same applies to OpenBLAS.

As OP tested, it seems you can get away with setting environment-variables for OpenMP, effecting in modification of behaviour even for the open-source candidates Atlas and OpenBLAS (where a compile-time limit is the alternative):

export OMP_NUM_THREADS="4";
sascha
  • 32,238
  • 6
  • 68
  • 110
  • 1
    Wow. I checked active BLAS library being used by numpy and then setting variable with `export OMP_NUM_THREADS="4";` before running the script did the trick. Thank you so much! – mr.tarsa Dec 28 '17 at 12:46
  • @tarashypka And which one is in use? MKL? Or did this even work for some of the other candidates? – sascha Dec 28 '17 at 12:46
  • 1
    Output is something like this `libraries = ['openblas', 'openblas']` – mr.tarsa Dec 28 '17 at 12:47