8

I have some misunderstandings with multiprocessing and map function.

I'll try to describe briefly:

Firstly, I have an list, for instance:

INPUT_MAGIC_DATA_STRUCTURE = [
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
    ['https://github.com', 'Owner', 'Repo', '', '', '0', '0'],
]

Also I have method, which currently parsing this list using specific internal logic:

def parse(api_client1, api_client2):
     for row in INPUT_MAGIC_DATA_STRUCTURE: 
         parsed_repo_row = ... (some logic with row)
         OUTPUT_MAGIC_DATA_STRUCTURE.append(parsed_repo_row)

Finally, I've red that there is some variants to make it async instead of for.

from multiprocessing import Pool
    pool = Pool(10)
    pool.map(<???>, INPUT_MAGIC_STRUCTURE)

??? – I cannot understand how to transfer my parse() from for row in INPUT_MAGIC_DATA_STRUCTURE as a first argument to pool.map() and transfer all its arguments — api_client1, api_client2.

Could you help me?

Thanks in advance.

UPD:

I've already made:

pool = Pool(10)
pool.map(parse(magic_parser, magic_staff), INPUT_MAGIC_DATA_STRUCTURE)

Anyway, when interpreter comes to the second line it stops and makes only one instance of parse() method (I see the logging output of parsed rows: 1 , 2 , 3 , 4 , 5 – one by one).

Vlad Rudskoy
  • 677
  • 3
  • 7
  • 24
  • So, you'll have many processes running on a list modifying it? – Tim Givois Mar 12 '17 at 16:31
  • @TimGivois Suppose that these processes will only append to the list. It is not possible? – Vlad Rudskoy Mar 12 '17 at 16:39
  • It is possible, if you are only appending to the list, because 'appends' are thread safe, that mean, that you can have x processes running concurrently modifying the list without any problems: http://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments – Tim Givois Mar 12 '17 at 16:42

2 Answers2

8

Put (some logic with row) in a function:

def row_logic(row):
    return result

Pass the function to Pool.map:

pool = Pool(10)
pool.map(row_logic, INPUT_MAGIC_DATA_STRUCTURE)
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
-1

We'll, in python it's not that easy. You need to map your rows to your parse per row function. Look at this link: https://gist.github.com/baojie/6047780

from multiprocessing import Process
def parse_row(row):
    (some logic with row)

def dispatch_job(rows, parse_row):
    for row in rows:
         p = Process(target=parse_row, args=(row,))
         p.start()
Tim Givois
  • 1,926
  • 2
  • 19
  • 36