11

I am trying to implement the multiprocessing module for a working with a large csv file. I am using Python 2.7 and following the example from here.

I ran the unmodified code (copied below for convenience) and noticed that print statements within the worker function do not work. The inability to print makes it difficult to understand the flow and debug.

Can anyone please explain why print is not working here? Does pool.map not execute print commands? I searched online but did not find any documentation that would indicate this.

import multiprocessing as mp
import itertools
import time
import csv

def worker(chunk):
    # `chunk` will be a list of CSV rows all with the same name column
    # replace this with your real computation
    print(chunk)     # <----- nothing prints
    print 'working'  # <----- nothing prints
    return len(chunk)  

def keyfunc(row):
    # `row` is one row of the CSV file.
    # replace this with the name column.
    return row[0]

def main():
    pool = mp.Pool()
    largefile = 'test.dat'
    num_chunks = 10
    results = []
    with open(largefile) as f:
        reader = csv.reader(f)
        chunks = itertools.groupby(reader, keyfunc)
        while True:
            # make a list of num_chunks chunks
            groups = [list(chunk) for key, chunk in
                      itertools.islice(chunks, num_chunks)]
            if groups:
                result = pool.map(worker, groups)
                results.extend(result)
            else:
                break
    pool.close()
    pool.join()
    print(results)

if __name__ == '__main__':
    main()
Community
  • 1
  • 1
Roberto
  • 2,054
  • 4
  • 31
  • 46
  • 2
    How are you running your program? Some IDEs may not set up their virtual terminal in a way that allows the worker processes to write to them. If you try running your program from the command line, however, I'd expect printing to work. – Blckknght May 05 '14 at 19:48
  • possible duplicate of [No print output from child multiprocessing.Process unless the program crashes](http://stackoverflow.com/questions/367053/python-multiprocessing-misunderstandings) – jscs May 05 '14 at 19:49
  • I think @Blckknght is on the right track. Printing like that should work fine. If I take your exact code, pass it a hard-coded list of lists instead of reading in a csv file, and run it via the CLI, it prints just like it should. – dano May 05 '14 at 19:50
  • @Blckknght - thanks for the response. I am running via IDLE – Roberto May 05 '14 at 20:00
  • @Josh Caswell - Thanks. The sys.stdout.flush() solution does not work within `worker`(fails with "Bad file descriptor" error). But I will keep trying on this line of thought. – Roberto May 05 '14 at 20:07

1 Answers1

16

This is an issue with IDLE, which you're using to run your code. IDLE does a fairly basic emulation of a terminal for handling the output of a program you run in it. It cannot handle subprocesses though, so while they'll run just fine in the background, you'll never see their output.

The simplest fix is to simply run your code from the command line.

An alternative might be to use a more sophisticated IDE. There are a bunch of them listed on the Python wiki, though I'm not sure which ones have better terminal emulation for multiprocessing output.

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • thanks, works fine when I ran at home (in Eclipse). My work setup is barebones, which is why I use IDLE instead of an IDE. I may need to rethink this for this particular project. – Roberto May 09 '14 at 00:35