large CSV file with only one column used as parameter of a function using multiprocessing

Question

I have a question regarding Python multiprocessing.

I have a large csv file: test.csv with 2 million lines and 2 columns:firm_id,product_id,in which the last one, namely product_id is a input of another function,say func1(product_id).

That's all about the basic information, as the file is very large and each product_id can be processed independently, I want to utilize Python's multiprocessing, which I never touched before. After Googling for a while, I found some useful information (like this and this) but none of them enable me to complete my task. I tried with the last one and edited like shows below, but it did not work,

import itertools as IT
import multiprocessing as mp
import csv
import funcitons as fdfunc # a self defined module with function func1 in it

def worker(chunk):
    return len(chunk)  


def main():    # num_procs is the number of workers in the pool
    num_procs = 2
    # chunksize is the number of lines in a chunk
    chunksize = 10**5

    pool = mp.Pool(num_procs)
    largefile = 'test.csv'
    results = []
    with open(largefile, 'r') as f,\
    open('file_to_store_resutl.csv','a+') as res_file:
        reader = csv.reader(f)       

        for chunk in iter(lambda: list(IT.islice(reader, chunksize*num_procs)), []):
            chunk = iter(chunk)
            pieces = list(iter(lambda: list(IT.islice(chunk, chunksize)), []))

            result = pool.imap(fdfunc.func1, pieces['product_id']) #pieces['product_id'] this definitely is wrong, just to show what I want to do
            writer = csv.writer(res_file)
            for item in result:
                writer.write_row(item)

            results.append(result)
main()

Anyone know how to do what I was trying?

What do you expect, and what do you get ? Error messages ? Wrong results ? — rocksportrocker, Sep 05 '18 at 07:51
Nothing wrong, no error message. As you can see, I need a column named 'product_id' in the `test.csv`, which has been cut into `pieces`, as an argument of function `func1`, I do not know how to do something like `func1(pieces['product_id'])` — Jia Gao, Sep 05 '18 at 07:55

large CSV file with only one column used as parameter of a function using multiprocessing

0 Answers0