My ultimate goal is to be able to pipe together command line processes that deal with files without touching the disk. Is this possible? I can't use stdin/stdout because some of the processes I need to run only take files (sometimes more than one) as inputs. I have been successful in doing this using FIFOs and Popen with small files within Python but not with larger files (on the MB scale). Here is a snippet of code that I'm using to test this functionality.
fifo1 = os.getcwd()+'/fifo1.nii'
fifo2 = os.getcwd()+'/fifo2.nii'
command = 'diff \''+fifo1+'\' \''+fifo2+'\''
os.mkfifo(fifo1)
os.mkfifo(fifo2)
with open('1_brain.nii', 'rb', 0) as r:
s1 = r.read()
with open('run1.nii', 'rb', 0) as r:
s2 = r.read()
def write(fifo, s):
with open(fifo, 'wb', 0) as f:
f.write(s)
writer1 = Thread(target=write, args=[fifo1, s1])
writer1.start()
writer2 = Thread(target=write, args=[fifo2, s2])
writer2.start()
proc = Popen(shlex.split(command), stdout=PIPE)
try:
while proc.poll() == None:
continue
print proc.communicate()[0]
except:
if proc.poll() == None:
proc.kill()
os.unlink(fifo1)
os.unlink(fifo2)
raise
os.unlink(fifo1)
os.unlink(fifo2)
This works with small text files, but when I run it on large binary files, I get a broken pipe error on my writing threads, so it seems like the read end (the diff process) is closing before the write finishes. I have gotten file-reading processes to read stdin by using a symlink to the stdin file descriptor, but I can't use stdin since I sometimes need multiple inputs. Is there a way to get FIFOs to work, or is it possible to create my own file descriptors that work like stdin to send data into processes? Please let me know if any of this is unclear! Thanks.