Cannot combine two lists into a map in multiprocessing in Python

Question

I have one csv with SKUs and URLs I break them in two lists with

def myskus():
    myskus =[]
    with open('websupplies2.csv', 'r') as csvf:
        reader = csv.reader(csvf, delimiter=";")
        for row in reader:
            myskus.append(row[0]) # Add each skus to list contents
    return myskus


def mycontents():
    contents = []
    with open('websupplies2.csv', 'r') as csvf:
        reader = csv.reader(csvf, delimiter=";")
        for row in reader:
            contents.append(row[1]) # Add each url to list contents
    return contents

Then I multiprocess my urls but I want to join the correspondin SKU

if __name__ == "__main__":

    with Pool(4) as p:
     records = p.map(parse, web_links)

    if len(records) > 0:
         with open('output_websupplies.csv', 'a') as f:
          f.write('\n'.join(records))

Can I put records = p.map(parse, skus, web_links)

because is not working

My desirable output format would be

sku    price    availability
bkk11  10,00    available

how can I achieve this?

Would this help your use case? https://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments https://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.starmap — Monty, Mar 16 '19 at 22:32

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

minor refactor

I recommend naming your pair of functions def get_skus() and def get_urls(), to match your problem definition.

data structure

Having a pair of lists, skus and urls, does not seem like a good fit for your high level problem. Keep them together, as a list of (sku, url) tuples, or as a sku_to_url dict. That is, delete one of your two functions, so you're reading the CSV once, and keeping the related details together. Then your parse() routine would have more information available to it. The list of tuples boils down to Monty's starmap() suggestion.

writing results

You're using this:

    if len(records) > 0:
        with open('output_websupplies.csv', 'a') as f:
            f.write('\n'.join(records))

Firstly, testing for at least one record is probably superfluous, it's not the end of the world to open for append and then write zero records. If you care about the timestamp on the file then perhaps it's a useful optimization.

More importantly, the write() seems Bad. One day an unfortunate character may creep into one of your records. Much better to feed your structured records to a csv.writer, to ensure appropriate quoting.

Cannot combine two lists into a map in multiprocessing in Python

1 Answers1

minor refactor

data structure

writing results