0

Here is the code I have:

import pandas as pd
import multiprocessing as mp

CPU = 4

inp = pd.DataFrame({ 'col': ['a', 'b'] })

def test(dataframe):
    df = dataframe.copy()

    def worker(data):
        print('worker')

    def callback(data):
        print('callback')

    pool = mp.Pool(CPU)

    for idx, row in df.iterrows():
        print((idx, row['col']))
        pool.apply_async(worker, args=[(idx, row['col'])], callback=callback)

    pool.close()
    pool.join()

    return df

test(inp)

It works as expected if I run in the upper scope (without enclosing in the test function), but after enclosing it in another function - they just are not called.

Here is output I receive with test function:

(0, 'a')
(1, 'b')

Without:

(0, 'a')
(1, 'b')
worker
worker
callback
callback

So the question is - how can I make it work inside another function?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
eawer
  • 1,398
  • 3
  • 13
  • 25

1 Answers1

1

From multiprocessing module documentation:

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).


Note Functionality within this package requires that the __main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here.

The one of the proper ways would be as follows:

import pandas as pd
import multiprocessing as mp

CPU = 4
inp = pd.DataFrame({'col': ['a', 'b']})

def worker(data):
    print(data)
    print('worker')

def callback(data):
    print('callback')

def test(dataframe):    
    df = dataframe.copy()   

    with mp.Pool(CPU) as pool:
        for idx, row in df.iterrows():
            result = pool.apply_async(worker, args=[(idx, row['col'])], callback=callback)
            result.wait()

    return df

if __name__ == '__main__':    
    test(inp)

The output:

(0, 'a')
worker
(1, 'b')
worker
callback
callback
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • well, what I wanted to do is to nest all helper functions under one parent, in this particular case under `test` function, is that possible? As I mentioned, If all functions live at the top level - code from my example works as well – eawer Jun 13 '18 at 16:22
  • 1
    @Shtirlits, https://stackoverflow.com/questions/46266803/multiprocessing-pool-not-working-in-nested-functions/46266853#46266853 – RomanPerekhrest Jun 13 '18 at 16:47