I am trying to implement the multiprocessing
module for a working with a large csv file. I am using Python 2.7 and following the example from here.
I ran the unmodified code (copied below for convenience) and noticed that print
statements within the worker
function do not work. The inability to print
makes it difficult to understand the flow and debug.
Can anyone please explain why print
is not working here? Does pool.map not execute print commands? I searched online but did not find any documentation that would indicate this.
import multiprocessing as mp
import itertools
import time
import csv
def worker(chunk):
# `chunk` will be a list of CSV rows all with the same name column
# replace this with your real computation
print(chunk) # <----- nothing prints
print 'working' # <----- nothing prints
return len(chunk)
def keyfunc(row):
# `row` is one row of the CSV file.
# replace this with the name column.
return row[0]
def main():
pool = mp.Pool()
largefile = 'test.dat'
num_chunks = 10
results = []
with open(largefile) as f:
reader = csv.reader(f)
chunks = itertools.groupby(reader, keyfunc)
while True:
# make a list of num_chunks chunks
groups = [list(chunk) for key, chunk in
itertools.islice(chunks, num_chunks)]
if groups:
result = pool.map(worker, groups)
results.extend(result)
else:
break
pool.close()
pool.join()
print(results)
if __name__ == '__main__':
main()