1

My ultimate goal is to be able to pipe together command line processes that deal with files without touching the disk. Is this possible? I can't use stdin/stdout because some of the processes I need to run only take files (sometimes more than one) as inputs. I have been successful in doing this using FIFOs and Popen with small files within Python but not with larger files (on the MB scale). Here is a snippet of code that I'm using to test this functionality.

fifo1 = os.getcwd()+'/fifo1.nii'
fifo2 = os.getcwd()+'/fifo2.nii'

command = 'diff \''+fifo1+'\' \''+fifo2+'\''

os.mkfifo(fifo1)
os.mkfifo(fifo2)

with open('1_brain.nii', 'rb', 0) as r:
    s1 = r.read()
with open('run1.nii', 'rb', 0) as r:
    s2 = r.read()

def write(fifo, s):
    with open(fifo, 'wb', 0) as f:
        f.write(s)

writer1 = Thread(target=write, args=[fifo1, s1])
writer1.start()

writer2 = Thread(target=write, args=[fifo2, s2])
writer2.start()

proc = Popen(shlex.split(command), stdout=PIPE)

try:
    while proc.poll() == None:
        continue
    print proc.communicate()[0]
except:
    if proc.poll() == None:
        proc.kill()
    os.unlink(fifo1)
    os.unlink(fifo2)
    raise

os.unlink(fifo1)
os.unlink(fifo2)

This works with small text files, but when I run it on large binary files, I get a broken pipe error on my writing threads, so it seems like the read end (the diff process) is closing before the write finishes. I have gotten file-reading processes to read stdin by using a symlink to the stdin file descriptor, but I can't use stdin since I sometimes need multiple inputs. Is there a way to get FIFOs to work, or is it possible to create my own file descriptors that work like stdin to send data into processes? Please let me know if any of this is unclear! Thanks.

Shark
  • 19
  • 1
  • See the pdf on [Python Generator Tricks For Systems Programmers](http://www.dabeaz.com/generators/) – Peter Wood Sep 11 '15 at 19:18
  • (1) 50MB is not large for a computer capable of running CPython. It indicates a bug in your code (2) Why do you read the files into memory only to dump them into named pipes instead of passing the files to the child process directly? Create a [complete minimal code example (use a dummy python script as a child process)](http://stackoverflow.com/help/mcve) (3) Drop bogus `while proc.poll() == None` loop, use just `proc.communicate()` instead. (4) Unrelated: you could [use `/dev/fd/N` filenames instead of named pipes](http://stackoverflow.com/a/28840955/4279) – jfs Sep 11 '15 at 20:53
  • related: [Create and pipe a file-like object as input for a command](http://stackoverflow.com/a/30894258/4279) – jfs Sep 11 '15 at 21:01
  • I've tried a variation of [the code from the previous comment](http://stackoverflow.com/a/30894258/4279) with 5GB input (100 times larger than your case). It works fine. – jfs Sep 11 '15 at 22:44
  • Thanks for the response! To address your question, I'm dumping into named pipes because my final goal is to chain together multiple processes, so the idea is that one process will dump into a pipe and the next will read it. Will the /dev/fd/ file descriptors act like a pipe, in that, will they be capable of the behavior I just described (dumping from one process, then reading in the next)? – Shark Sep 13 '15 at 17:49
  • @Shark: yes, the [code example from the link in my 1st comment](http://stackoverflow.com/a/28840955/4279) demonstrate exactly that. – jfs Sep 15 '15 at 14:45

0 Answers0