How to create an async multiprocessing JobQueue in Python?

Question

I'm trying to make a Python 'JobQueue' that performs computationally intensive tasks asynchronously, on a local machine, with a mechanism that returns the results of each task to the main process. Python's multiprocessing.Pool has an apply_async() function that meets those requirements by accepting an arbitrary function, its multiple arguments, and callback functions that return the results. For example...

    import multiprocessing

    pool = multiprocessing.Pool(poolsize)
    pool.apply_async(func, args=args, 
                     callback=mycallback,
                     error_callback=myerror_callback)

The only problem is that the function given to apply_async() must be serializable with Pickle and the functions I need to run concurrently are not. FYI, the reason is, the target function is a member of an object that contains an IDL object, for example:

    from idlpy import IDL
    self.idl_obj = IDL.obj_new('ImageProcessingEngine')

This is the error message received at the pool.apply_async() line:

'Can't pickle local object 'IDL.__init__.<locals>.run''

What I tried

I made a simple implementation of a JobQueue that works perfectly fine in Python 3.6+ provided the Job object and it's run() method are Pickleable. I like how the main process can receive an arbitrarily complex amount of data returned from the asynchronously executed function via a callback function.

I tried to use pathos.pools.ProcessPool since it uses dill instead of pickle. However, it doesn't have a method similar to apply_async(). Are there any other options, or 3rd party libraries that provide this functionality using dill, or by some other means?

sophros · Answer 1 · 2019-07-16T13:35:33.240

0

How about creating a stub function that would instantiate the IDL endopoint as a function static variable?

Please note that this is only a sketch of the code as it is hard to say from the question if you are passing IDL objects as parameters to the function you run in parallel or it serves another purpose.

def stub_fun(paramset):
    if 'idl_obj' not in dir(stub_fun):  # instantiate once
        stub_fun.idl_obj = IDL.obj_new('ImageProcessingEngine')

    return stub_fun.idl_obj(paramset)

edited Jul 16 '19 at 13:35

answered Jul 16 '19 at 13:30

sophros

14,672
11
46
75

Thanks! Although this answer doesn't directly address my question about a pickle alternative, it is an interesting idea that eliminates serialization altogether. It's thread-safe since every worker process in the pool instantiates its own IDL object the first time the worker process receives a 'stub_fun' to execute.In each subsequent execution, the worker reuses the IDL object assigned to the function. The advantage of this approach vs. pickle: * Works on all Python objects; pickle doesn't * Speed. The computational cost of serialization is eliminated. – Tom Jordan Jul 26 '19 at 21:27

How to create an async multiprocessing JobQueue in Python?

1 Answers1