I have spent a considerable time trying to get the Linux diff and patch tools to work in python with strings. To achieve this I try to use named pipes since they seem the most robust way to go. The problem is that this doesn't work for big files.
Example:
a, b = str1, str2 # ~1MB each string
fname1, fname2 = mkfifos(2)
proc = subprocess.Popen(['diff', fname1, fname2], \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print('Writing first file.')
with open(fname1, 'w') as f1:
f1.write(a)
print('Writing second file.')
with open(fname2, 'w') as f2:
f2.write(b)
This hangs at the first write. If figured out that if I use a[:6500]
it hangs on the second write. So I would assume it has something to do with the buffer. I tried manually flushing after each write, closing, using the lowlevel os.open(f, 'r', 0)
with 0 buffer but still the same issue.
I thought of looping through the write in chunks but that feels wrong in a high level language like Python. Any ideas what I am doing wrong?