Questions tagged [joblib]

Joblib is a set of tools to provide lightweight pipelining in Python.

Joblib is a set of tools to provide lightweight pipelining in Python.

https://joblib.readthedocs.io/en/latest/

715 questions
125
votes
10 answers

ImportError: cannot import name 'joblib' from 'sklearn.externals'

I am trying to load my saved model from s3 using joblib import pandas as pd import numpy as np import json import subprocess import sqlalchemy from sklearn.externals import joblib ENV = 'dev' model_d2v = load_d2v('model_d2v_version_002', ENV) def…
Praneeth Sai
  • 1,421
  • 2
  • 7
  • 11
94
votes
4 answers

What does the delayed() function do (when used with joblib in Python)

I've read through the documentation, but I don't understand what is meant by: The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. I'm using it to iterate over the list I want to…
orrymr
  • 2,264
  • 4
  • 21
  • 29
70
votes
9 answers

How can we use tqdm in a parallel execution with joblib?

I want to run a function in parallel, and wait until all parallel nodes are done, using joblib. Like in the example: from math import sqrt from joblib import Parallel, delayed Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10)) But, I want…
Dror Hilman
  • 6,837
  • 9
  • 39
  • 56
64
votes
5 answers

Joblib UserWarning while trying to cache results

I get the following UserWarning when trying to cache results using joblib: import numpy from tempfile import mkdtemp cachedir = mkdtemp() from joblib import Memory memory = Memory(cachedir=cachedir, verbose=0) @memory.cache def…
user308827
  • 21,227
  • 87
  • 254
  • 417
58
votes
11 answers

Tracking progress of joblib.Parallel execution

Is there a simple way to track the overall progress of a joblib.Parallel execution? I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a…
Cerin
  • 60,957
  • 96
  • 316
  • 522
43
votes
1 answer

Out-of-core processing of sparse CSR arrays

How can one apply some function in parallel on chunks of a sparse CSR array saved on disk using Python? Sequentially this could be done e.g. by saving the CSR array with joblib.dump opening it with joblib.load(.., mmap_mode="r") and processing the…
rth
  • 10,680
  • 7
  • 53
  • 77
36
votes
8 answers

How to properly pickle sklearn pipeline when using custom transformer

I am trying to pickle a sklearn machine-learning model, and load it in another project. The model is wrapped in pipeline that does feature encoding, scaling etc. The problem starts when i want to use self-written transformers in the pipeline for…
spiral
  • 381
  • 1
  • 3
  • 6
26
votes
6 answers

KeyError when loading pickled scikit-learn model using joblib

I have an object that contains within it two scikit-learn models, an IsolationForest and a RandomForestClassifier, that I would like to pickle and later unpickle and use to produce predictions. Apart from the two models, the object contains a couple…
haroba
  • 2,120
  • 4
  • 22
  • 37
24
votes
3 answers

Printed output not displayed when using joblib in jupyter notebook

So I am using joblib to parallelize some code and I noticed that I couldn't print things when using it inside a jupyter notebook. I tried using doing the same example in ipython and it worked perfectly. Here is a minimal (not) working example to…
24
votes
2 answers

How to write to a shared variable in python joblib

The following code parallelizes a for-loop. import networkx as nx; import numpy as np; from joblib import Parallel, delayed; import multiprocessing; def core_func(repeat_index, G, numpy_arrary_2D): for u in G.nodes(): …
user3813057
  • 891
  • 3
  • 13
  • 31
24
votes
2 answers

Why is it important to protect the main loop when using joblib.Parallel?

The joblib docs contain the following warning: Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this: import…
Joe
  • 3,831
  • 4
  • 28
  • 44
22
votes
3 answers

How do I store a TfidfVectorizer for future use in scikit-learn?

I have a TfidfVectorizer that vectorizes collection of articles followed by feature selection. vectroizer = TfidfVectorizer() X_train = vectroizer.fit_transform(corpus) selector = SelectKBest(chi2, k = 5000 ) X_train_sel =…
user2161903
  • 577
  • 1
  • 6
  • 22
21
votes
7 answers

spacy with joblib library generates _pickle.PicklingError: Could not pickle the task to send it to the workers

I have a large list of sentences (~7 millions), and I want to extract the nouns from them. I used joblib library to parallelize the extracting process, like in the following: import spacy from tqdm import tqdm from joblib import Parallel,…
Minions
  • 5,104
  • 5
  • 50
  • 91
21
votes
2 answers

how to save a scikit-learn pipline with keras regressor inside to disk?

I have a scikit-learn pipline with kerasRegressor in it: estimators = [ ('standardize', StandardScaler()), ('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=5, batch_size=1000, verbose=1)) ] pipeline = Pipeline(estimators) After,…
Dror Hilman
  • 6,837
  • 9
  • 39
  • 56
21
votes
3 answers

Python scikit learn n_jobs

This is not a real issue, but I'd like to understand: running sklearn from Anaconda distrib on a Win7 4 cores 8 GB system fitting a KMeans model on a 200.000 samples*200 values table. running with n-jobs = -1: (after adding the if __name__ ==…
Bruno Hanzen
  • 351
  • 1
  • 2
  • 7
1
2 3
47 48