I am trying to run the ElasticNet function from scikit-learn
on a machine with multiple CPUs. However, I need the ElasticNet fit to only use one CPU, since I need to run other fitting routines in parallel on the remaining CPUs. Whenever the thread containing ElasticNet
starts the fit, it quickly takes over any free space on all CPUs instead of just the one its called on. Because other code routines are running on these machines already, ElasticNet
oversubscribes the machines and slows everything down tremendously, including itself. I need these routines to run in parallel, so I cannot just run the ElasticNet
fit serially ahead of time.
Unlike other regression functions (linear, logistic...) in sklearn
there is no n_jobs
argument for ElasticNet
. Reading the documentation, it appears that ElasticNet
defaults to the n_jobs
specified in joblib.parallel_backend
which itself defaults to n_jobs=-1
, which is all available CPUs.
I am trying to figure out the proper method for specifying n_jobs
in parallel_backend
so that it will override the default for ElasticNet
. Following are three attempts to change n_jobs
that have not worked so far.
Attempt 1
from joblib import parallel_backend
from sklearn.linear_model import ElasticNet
with parallel_backend('loky', n_jobs=1):
model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, fit_intercept=False,
normalize=False, copy_X=True, max_iter=10000, tol=10,
random_state=42, precompute=False, warm_start=False,
positive=False, selection='cyclic')
model.fit(predictors, response)
Attempt 2
from sklearn.utils import parallel_backend
from sklearn.linear_model import ElasticNet
with parallel_backend('loky', n_jobs=1):
model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, fit_intercept=False,
normalize=False, copy_X=True, max_iter=10000, tol=10,
random_state=42, precompute=False, warm_start=False,
positive=False, selection='cyclic')
model.fit(predictors, response)
Both Attempt 1 and Attempt 2 do not throw any errors, but also do not appear to change n_jobs
from the default of using every available CPU. ElasticNet
still takes over all available CPU space across all CPUs and quickly oversubscribes the machines.
Attempt 3
This is my first time using joblib
directly, and so I've been reading the documentation on parallelization with joblib. Most of the example routines placed in the parallel_backend
container are prefaced with the Parallel()
helper class.
Following the examples, I modified Attempt 1 in the following way:
from joblib import parallel_backend
from sklearn.linear_model import ElasticNet
with parallel_backend('loky', n_jobs=1):
model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, fit_intercept=False,
normalize=False, copy_X=True, max_iter=10000, tol=10,
random_state=42, precompute=False, warm_start=False,
positive=False, selection='cyclic')
Parallel(n_jobs=1)(model.fit(predictors, response))
However when running Attempt 3, I get the following error message:
TypeError: 'ElasticNet' object is not iterable
Does anyone know how to set n_jobs=1
for sklearn's ElasticNet
? There must be some way to do this because ElasticNetCV
has n_jobs
as a possible argument. Any help with this is greatly appreciated!