It seems obvious that writing from multiple process to the same file may cause corrupted data if the write()
calls are not somehow synchronized. See this other question: Python multiprocessing safely writing to a file.
However, while trying to reproduce this possible bug for testing purposes, I was not able to cause the file messages to be mixed up. I wanted to do this to effectively compare with and without the lock security.
Without doing anything, file seems somehow protected.
import multiprocessing
import random
NUM_WORKERS = 10
LINE_SIZE = 10000
NUM_LINES = 10000
def writer(i):
line = ("%d " % i) * LINE_SIZE + "\n"
with open("file.txt", "a") as file:
for _ in range(NUM_LINES):
file.write(line)
def check(file):
for _ in range(NUM_LINES * NUM_WORKERS):
values = next(file).strip().split()
assert len(values) == LINE_SIZE
assert len(set(values)) == 1
if __name__ == "__main__":
processes = []
for i in range(NUM_WORKERS):
process = multiprocessing.Process(target=writer, args=(i, ))
processes.append(process)
for process in processes:
process.start()
for process in processes:
process.join()
with open("file.txt", "r") as file:
check(file)
I'm using Linux and I also know that file-writing may be atomic depending on the buffer size: Is file append atomic in UNIX?.
I tried to increase the size of the messages, but it doesn't produce corrupted data.
Do you know of any code sample I could use that produce corrupted files using multiprocessing on Linux?