-3

This may be a very easy question but definitely worn me out. To use multiprocessing, I wrote the following code. the main function creates two processes which both use the same function , called prepare_input_data() but process different input datasets. this function must return multiple objects and values for each input to be used in the next steps of the code (not include here).

What I want is to get more than one value or object as a return from the function I am using in multiprocessing.

def prepare_input_data(inputdata_address,temporary_address, output):
    p=current_process()
    name = p.name
    data_address = inputdata_address 
    layer = loading_layer(data_address)    

    preprocessing_object = Preprocessing(layer)
    nodes= preprocessing_object.node_extraction(layer)
    tree = preprocessing_object.index_nodes()
    roundabouts_dict , roundabouts_tree= find_roundabouts(layer.address, layer, temporary_address)

    #return layer, nodes, tree, roundabouts_dict, roundabouts_tree
    #return [layer, nodes, tree, roundabouts_dict, roundabouts_tree]
    output.put( [layer, nodes, tree, roundabouts_dict, roundabouts_tree])


if __name__ == '__main__':
    print "the data preparation in multi processes starts here"
    output=Queue() 
    start_time=time.time()
    processes =[]
    #outputs=[]
    ref_process = Process(name ="reference", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/NVDB_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output)) 
    cor_process = Process(name ="corresponding", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/OSM_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output))
    #outputs.append(ref_process.start)
    #outputs.append(cor_process.start)
    ref_process.start
    cor_process.start
    processes.append(ref_process)
    processes.append(cor_process)
    for p in processes:
        p.join()

    print "the whole data preparation took ",time.time()-start_time
    results={}
    for p in processes:
        results[p.name]=output.get()
    ########################
    #ref_info = outputs[0]
    # ref_nodes=ref_info[0]

Previous ERROR when I use return,ref_info[0] has Nonetype.

ERROR: based on the answer here I changed it to a Queueu object passed to the function then I used put() to add the results and get() to retrieve them for the further processing.

Traceback (most recent call last):
File "C:\Python27\ArcGISx6410.2\Lib\multiprocessing\queues.py", line 262, in _feed
    send(obj)
UnpickleableError: Cannot pickle <type 'geoprocessing spatial reference object'> objects

Could you please help me solve how to return more than one value from a function in multiprocessing?

Community
  • 1
  • 1
msc87
  • 943
  • 3
  • 17
  • 39
  • 4
    You seem to have forgotten to actually ask a question! – Marcus Müller Feb 04 '15 at 18:11
  • @MarcusMüller: it might be clearer now. – msc87 Feb 05 '15 at 09:14
  • _... as a return from the function..._ but you're not returning _anything_ at the moment, and nor are you looking at the values returned from anywhere. Can you write a minimal and self-contained example showing what you actually want to do? Most of your posted code seems to be unrelated to the multiprocessing question. – Useless Feb 05 '15 at 11:39
  • @Useless: I had return values with the syntax return in my function but then I changed it to a Queue.put()since I kept getting Nonetype. – msc87 Feb 05 '15 at 12:10
  • The question is definitely not clear at all...but looking at your error, it seems that you passed a object that cannot be "pickled" (serialized in `python`) to the multiprocessing queue. You can either pass a pickable value (strings, integers, for instance), or...*nasty solution* set it in a global variable. – finiteautomata Feb 05 '15 at 13:38
  • Aha, so now you have an error. Perhaps you could ask a question about how to pickle `geoprocessing spatial reference object`s where you show what they actually are? – Useless Feb 05 '15 at 13:58
  • @geekazoid: I think the problem mainly is that in all (most of) examples the function is just printing. now I want to return several values which none of them is geoprocessing object as you can see in the code. I don't know how to return several value. Should I use pickle or simple return can do that for me? Should I use pool.map or Process is ok? – msc87 Feb 05 '15 at 14:03
  • well, I did my best to make it clear, I have no idea how to ask it in any other way. it is simple I want mutiple returns from my function.' – msc87 Feb 10 '15 at 12:36
  • The construct you have -- `output.put( [list, of, return, values] )` *should work*. If it doesn't work, the problem is *not* with putting a list on the queue, but with one of the things *in* the list. And your question is still unclear because you *talk* about returning multiple values, which makes us think of `return (a, b, c)` -- which should also work with `Pool.map` at least, btw -- but then your code uses queues, so we are confused. – zwol Feb 16 '15 at 17:20

3 Answers3

0

Parallel programming with shared state is a rocky road that even experienced programmers get wrong. A much more beginner-friendly method is to copy data around. This is the only way to move data between subprocesses (not quite true, but that's an advanced topic).

Citing https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes, you'll want to setup a multiprocessing.Queue to fill with your returned data for each of your subprocesses. Afterward you can pass the queue to be read from to the next stage.

For multiple different datasets, such as your layer, nodes, tree, etc, you can use multiple queues to differentiate each return value. It may seem a bit cluttered to use a queue for each, but it's simple and understandable and safe.

Hope that helps.

  • 1
    This does not address how to return multiple values... which is what the OP asked. –  Feb 16 '15 at 17:00
  • Each value is a queue. If that isn't multiple values returned then I don't understand the question. I'll edit my answer to clarify this. – JumpandSpintoWin Feb 16 '15 at 17:12
0

if you use jpe_types.paralel's Process it will return the return value of the Processes target function like so

import jpe_types.paralel


def fun():
    return 4, 23.4, "hi", None

if __name__ == "__main__":
    
    p = jpe_types.paralel.Process(target = fun)
    p.start()
    print(p.join())

otherwise you could

import multiprocessing as mp

def fun(returner):
    returner.send((1, 23,"hi", None))

if __name__ == "__main__":
    processes = []
    for i in range(2):
        sender, recever = mp.Pipe()
        p = mp.Process(target = fun, args=(sender,))
        p.start()
        processes.append((p, recever))

    resses = []

    for p, rcver in processes:
        p.join()
        resses.append(rcver.recv())
    print(resses)

using the conection will garantee that the retun's don't get scrambeld

Julian wandhoven
  • 268
  • 2
  • 11
-2

If you are looking to get multiple return values from multiprocessing, then you can do that. Here's a simple example, first in serial python, then with multiprocessing:

>>> a,b = range(10), range(10,0,-1)
>>> import math
>>> map(math.modf, (1.*i/j for i,j in zip(a,b)))
[(0.0, 0.0), (0.1111111111111111, 0.0), (0.25, 0.0), (0.42857142857142855, 0.0), (0.6666666666666666, 0.0), (0.0, 1.0), (0.5, 1.0), (0.3333333333333335, 2.0), (0.0, 4.0), (0.0, 9.0)]
>>> 
>>> from multiprocessing import Pool
>>> res = Pool().imap(math.modf, (1.*i/j for i,j in zip(a,b)))
>>> for i,ai in enumerate(a):
...   x,y = res.next()
...   print("{x},{y} = modf({u}/{d})").format(x=x,y=y,u=ai,d=b[i])
... 
0.0,0.0 = modf(0/10)
0.111111111111,0.0 = modf(1/9)
0.25,0.0 = modf(2/8)
0.428571428571,0.0 = modf(3/7)
0.666666666667,0.0 = modf(4/6)
0.0,1.0 = modf(5/5)
0.5,1.0 = modf(6/4)
0.333333333333,2.0 = modf(7/3)
0.0,4.0 = modf(8/2)
0.0,9.0 = modf(9/1)

So to get multiple values in the return from a function with multiprocessing, you only need to have a function that returns multiple values… you will just get the values back as a list of tuples.

The major issue with multiprocessing, as you can see from your error… is that most functions don't serialize. So, if you really want to do what it seems like you want to do… I'd strongly suggest you use pathos (as discussed below). The largest barrier you will have with multiprocessing is that the functions you are passing as the target must be serializable. There are several modifications you can make to your prepare_input_data function… the first of which is to make sure it is encapsulated. If your function is not fully encapsulated (e.g. it has name-reference lookups outside of it's own scope), then it probably won't pickle with pickle. That means, you need to include all imports inside the target function and pass any other variables in through the function input. The error you are seeing (UnPicklableError) is due to your target function and it's dependencies not being able be serialized -- and not that you can't return multiple values from multiprocessing.

While I'd encapsulate the target function anyway as a matter of good practice, it can be a bit tedious and could slow your code down a hair. I also suggest that you convert your code to use dill and pathos.multiprocessing -- dill is an advanced serializer that can pickle almost all python objects, and pathos provides a multiprocessing fork that uses dill. That way, you can pass most python objects in the pipe (i.e. apply) or the map that is available form the Pool object and not worry too much about sweating too hard refactoring your code to make sure plain old pickle and multiprocessing can handle it.

Also, I'd use an asynchronous map instead of doing what you are doing above. pathos.multiprocessing has the ability to take multiple arguments in the map function, so you don't need to wrap them in the tuple args as you've done above. The interface should be much cleaner with an asynchronous map, and you can return multiple arguments if you need to… just pack them in a tuple.

Here's some examples that should demonstrate what I'm referring to above.

Return multiple values:

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> def addsub(x,y):
...   return x+y, x-y
... 
>>> a,b = range(10),range(-10,10,2)
>>> res = Pool().imap(addsub, a, b)
>>> 
>>> for i,ai in enumerate(a):
...   add,sub = res.next()
...   print("{a} + {b} = {p}; {a} - {b} = {m}".format(a=ai,b=b[i],p=add,m=sub))
... 
0 + -10 = -10; 0 - -10 = 10
1 + -8 = -7; 1 - -8 = 9
2 + -6 = -4; 2 - -6 = 8
3 + -4 = -1; 3 - -4 = 7
4 + -2 = 2; 4 - -2 = 6
5 + 0 = 5; 5 - 0 = 5
6 + 2 = 8; 6 - 2 = 4
7 + 4 = 11; 7 - 4 = 3
8 + 6 = 14; 8 - 6 = 2
9 + 8 = 17; 9 - 8 = 1
>>> 

Asynchronous map: Python multiprocessing - tracking the process of pool.map operation

pathos: Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()

pathos: What can multiprocessing and dill do together?

We still can't run your code… but if you post code that can be run, it might be more possible help edit your code (using the pathos fork and the asynchronous map or otherwise).

FYI: A release for pathos is a little bit overdue (i.e. late), so if you want to try it, it's best to get the code here: https://github.com/uqfoundation

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • 2
    The OP obviously is a beginner and fails to grasp fundamental concepts about controlling concurrency and the `multiprocessing` components. In such a case, I do not think that it is particularly helpful to propose usage of two third party packages. Better explain, pedagogically, how to solve this very basic problem *conceptually*; and then propose standard library means for the actual implementation. – Dr. Jan-Philip Gehrcke Feb 11 '15 at 21:56
  • I'd agree with you in general. However, the above packages have the same interface for what I'm suggesting (or very nearly if not completely)… and I've suggested those packages in part b/c I'm the author and in part because they are cleaner and conceptually easier to use for the beginner than the python standard library packages are. – Mike McKerns Feb 12 '15 at 00:06