4

My program spawns multiple processes to do some time consuming calculations. The results are then collected in a queue and a writer process writes them into an output file.

Below is a simplified version of my code which should illustrate my issue. If I comment out the flush statement in the Writer class, test.out is empty at the end of the program.

What exactly is happening here? Is test.out not closed properly? Was it naive to assume that passing the file handle to an autonomous process should work in the first place?

from multiprocessing import JoinableQueue, Process

def main():
    queue = JoinableQueue()
    queue.put("hello world!")

    with open("test.out", "w") as outhandle:
        wproc = Writer(queue, outhandle)
        wproc.start()
        queue.join()

    with open("test.out") as handle:
        for line in handle:
            print(line.strip())

class Writer(Process):

    def __init__(self, queue, handle):
        Process.__init__(self)
        self.daemon = True
        self.queue = queue
        self.handle = handle

    def run(self):
        while True:
            msg = self.queue.get()
            print(msg, file=self.handle)
            #self.handle.flush()
            self.queue.task_done()

if __name__ == '__main__':
    main()
cel
  • 30,017
  • 18
  • 97
  • 117
  • I'm not using python3, but when I uncomment `self.handle.flush()` and use `self.handle.write(msg)` instead of `print(msg, file=self.handle)` your code works. – Christian Eichelmann Mar 23 '15 at 07:57
  • There _are_ ways to pass a file handle to a child process, but it requires either (1) starting the child process from a parent with it open -- which means that pool types that preopen the children or reuse a single child to process multiple items that aren't all already known when the child is started can't do it -- or (2) using [deep UNIX magic](https://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket) (requiring that it be a very specific socket type used for communication between the processes). – Charles Duffy May 24 '23 at 21:04

2 Answers2

2

The writer is a separate process. The data it writes to the file might be buffered, and because the process keeps running, it doesn't know that it should flush the buffer (write it to the file). Flushing manually is the right thing to do.

Normally, the file would be closed when you exit the with block, and this would flush the buffer. But the parent process doesn't know anything about its children's buffers, so the child has to flush it's own buffer (closing the file should work too - that doesn't close the file for the parent, at least on Unix systems).

Also, check out the Pool class from multiprocessing (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) - it might save you some work.

Teyras
  • 1,262
  • 11
  • 22
0

I have experienced the same issue when writing the output from a processed pooled dataset to a file directly.

This was sorted as follows, 1. Collect the pooled results into a list and then write to a file.

This is mainly happening due to, the hard disk can't write the speed which processor processes, the buffered content getting lost or pooled data is not in the proper order.

Best thing to do is, allocate pooled output data into a memory location or to a variable (string or a list) and then write to a file, in this way things should get sorted out.

Good lick!

Chamara
  • 1
  • 2
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 21 '21 at 11:31