2

How can I redirect prints that occur within a multiprocessing Pool into a StringIO()

I am redirecting the sys.stdout into a StringIO(), this works well as long as I don't use pool from the multiprocessing library.

This toy code is an example:

import io
import sys
from multiprocessing import Pool

print_file = io.StringIO()
sys.stdout = print_file

def a_print_func(some_string):
    print(some_string)

pool = Pool(2)  
out = pool.map(a_print_func, [['test_1','test_1'],['test_2','test_2']])

a_print_func('no_pool')
print('no_pool, no_func')

fd = open('file.txt', 'w')
fd.write(print_file.getvalue())
fd.close()

file.txt only contains:

no_pool
no_pool, no_func

instead of:

test_1
test_1
test_2
test_2
no_pool
no_pool, no_func
Roy2012
  • 11,755
  • 2
  • 22
  • 35
AnarKi
  • 857
  • 1
  • 7
  • 27
  • Does this answer your question? [how-do-i-get-a-thread-safe-print-in-python-2-6](https://stackoverflow.com/questions/3029816) – stovfl Jun 17 '20 at 13:13
  • @stovfl I cant see how exactly. plus wouldn't the use of of a thread lock cancel the multiprocessing aspect ? – AnarKi Jun 17 '20 at 13:31
  • Since memory is not shared between the processes, the object "print_file" is not shared between the parent process and the child ones. As a result, what's written to print_file by the children is not seen by the parent. – Roy2012 Jun 17 '20 at 13:33
  • @AnarKi You have to use a `multiprocessing` lock instead. – stovfl Jun 17 '20 at 13:34
  • If you'd like, there are ways to direct all the child process to a file. – Roy2012 Jun 17 '20 at 13:36
  • @Roy2012 how can you get that ? – AnarKi Jun 17 '20 at 13:36
  • @stovfl sure, but what I mean is that you lose the parallel run aspect which is essential for me – AnarKi Jun 17 '20 at 13:37
  • **lose the parallel run aspect**: Yes, the core point is you hav.e to serialize your print statements. Another approach are to use a queue and a single process to read from the queue and do the print. – stovfl Jun 17 '20 at 13:40
  • @AnarKi - see my answer below. – Roy2012 Jun 17 '20 at 13:41

1 Answers1

1

Here's a solution for directing the output from all the child processes to a single file, using an initializer:

import io
import sys
from multiprocessing import Pool

print_file = io.StringIO()

print_file = open("file.txt", "w")

def a_print_func(some_string):
    print(some_string)

def foo(*args):
    sys.stdout = print_file

pool = Pool(2, initializer = foo)  
out = pool.map(a_print_func, [['test_1','test_1'],['test_2','test_2']])

a_print_func('no_pool')
print('no_pool, no_func')

The output of the program is

no_pool
no_pool, no_func

And, the content of file.txt at the end of the execution is:

['test_1', 'test_1']
['test_2', 'test_2']
Roy2012
  • 11,755
  • 2
  • 22
  • 35
  • Does this answer your question? – Roy2012 Jun 19 '20 at 06:10
  • this wont work if you use `print_file = io.StringIO()`. Do you know why is that and how to work around it ? – AnarKi Jun 26 '20 at 13:48
  • Every child process can have its own StringIO output, but they can't share it AFAIK. StingIO is in memory, and unless you find a way to share the memory StringIO is using between multiple processes, you can't do that. What you can do is have a StringIO at every child process, and print it to some file at the end of that child process. – Roy2012 Jun 26 '20 at 14:32
  • Alternatively, the parent process can capture the child processes' stdout, and do whatever it wants with them. – Roy2012 Jun 26 '20 at 14:33