-1

I have a piece of code below where the joblib.Parallel() returns a list.

import numpy as np
from joblib import Parallel, delayed

lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)

def proj_func(i):
    return np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))

proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))

Instead of a list, how do I return a generator using joblib.Parallel()?

Edit:

I have updated the code as suggested by @user3666197 in comments below.

import numpy as np
from joblib import Parallel, delayed

lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr = np.array(lst)
w, v = np.linalg.eigh(arr)

def proj_func(i):
    yield np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1))

proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))

But I am getting this error:

TypeError: can't pickle generator objects

Am I missing something? How do I fix this? My main gain here is to reduce memory as proj can get very large, so I would just like to call each generator in the list one at a time.

Leockl
  • 1,906
  • 5
  • 18
  • 51

1 Answers1

1

Q : "how do I return a generator using joblib.Parallel?"

Given the joblib purpose and implementation, focused on distributing code-execution units, using a set of spawned, independent processes ( yes, motivated by a boosted performance from an escape from a central GIL-lock re-[SERIAL]-ised dancing one-GIL-step-after-another-GIL-step-after-... ) made by the syntactic constructor known as joblib.Parallel(...)( delayed()(...) ), my, obviously limited imagination, tells me, the maximum achievable is but to make the "remotely" executed processes to return back to main the requested generator(s) that are joblib-assembled ( out of one's control ) into a list.

So an achievable maximum is to receive a list of generators, not any form of a deferred-execution, wrapped on return as a generator, given the above set of initial conditions and given the function fun(), set to be injected via the delayed( fun )(...) into the joblib.Parallel( n_jobs = ... )-many "remote"-processes, will indeed do so.


A Bonus Part :

If we were indeed pedantic purists, the only chance to receive but "a ( one ) generator using joblib.Parallel()", for that to happen the n_jobs would need to be just == 1, which lexically and logically will meet the defined goal --to return (but) a (one) generator--, yet would be less efficient and less meaningful, than throwing money into the river of Nile...

user3666197
  • 1
  • 6
  • 50
  • 92
  • Thanks, if I am understanding what you are saying correctly what you are saying, this can be done by creating a **list of generators** using **”remote”-processes**. Is this right? If yes, how do you do this? – Leockl Mar 08 '20 at 10:07
  • 1
    @Leockl Yes, you understand that correctly. **Step 1)** `def aNextNUM( aNum = 0 ): yield aNum + 1` **Step 2)** assign results returned from the `N_jobs`-many spawned `joblib.Parallel` constructor processes ( as you do above ), given the remotely-executed function simply do `return aNextNUM` a generator-object as their respective resulting value, being `joblib.Parallel()`-delivered back to the `main` caller and you are done. – user3666197 Mar 08 '20 at 16:34
  • Thanks @user3666197 for pointing me to the right direction on this. – Leockl Mar 10 '20 at 09:52
  • Hi @user3666197, I have updated my codes as you had suggested but I am getting this error: `TypeError: can't pickle generator objects`. Can you help with what I am missing? I have updated the question. – Leockl Mar 13 '20 at 04:47
  • *(Cit.) **"I have updated the question"*** - well, this is exactly what StackOverflow strongly discourages from doing. Feel free to open a new question for any new direction or on a brand new issue. Updating the original problem definition skews and creeps the focus. Do not do it again and **rather follow the Community Rules. *That's fair, isn't it?*** – user3666197 Mar 14 '20 at 15:00
  • Ok @user3666197 apologies about this. Will keep note of this from now on. – Leockl Mar 15 '20 at 08:20
  • Update: it should now be possible to get a generator to process results while the work in still ongoing and not just receive the finished list at the end. See https://joblib.readthedocs.io/en/latest/auto_examples/parallel_generator.html#sphx-glr-auto-examples-parallel-generator-py (however it seems to be still in development according to the changelog) – E. Körner May 31 '23 at 23:39