I have the following python program, which starts three processes that each write 10000 random rows to the same file using an inherited file handle:
import multiprocessing
import random
import string
import traceback
if __name__ == '__main__':
# clear out the file first
open('out.txt', 'w')
# initialise file handle to be inherited by sub-processes
file_handle = open('out.txt', 'a', newline='', encoding='utf-8')
process_count = 3
# routine to be run by sub-processes
# adds n lines to the file
def write_random_rows(n):
try:
letters = string.ascii_lowercase
for _ in range(n):
s = ''.join(random.choice(letters) for _ in range(100))
file_handle.write(s+"\n")
except Exception:
traceback.print_exc()
if __name__ == '__main__':
# initialise the multiprocessing pool
process_pool = multiprocessing.Pool(processes=process_count)
# write the rows
for i in range(process_count):
process_pool.apply_async(write_random_rows, (10000,))
# write_random_rows(10000)
# wait for the sub-processes to finish
process_pool.close()
process_pool.join()
As a result of running this, I expect the file to contain 30000 rows. If I run write_random_rows(10000)
inside my main loop (the commented out line in the above program), 30000 rows are written to the file as expected. However, if I run the non-commented line, process_pool.apply_async(write_random_rows, (10000,))
, I end up with 15498 rows in the file.
Strangely, no matter how many times I rerun this script, I always get the same (incorrect) number of rows in the output file.
I can fix this issue by initializing the file handle from within write_random_rows()
, i.e. within the sub-process execution, which suggests that somehow the inherited file handles are interfering with each other. If it was related to some kind of race condition though, I would expect the number of rows to change each time I ran the script. Why exactly does this issue occur?