2

A program that I can not modify writes it's output to a file provided as an argument. I want to have the output to go to RAM so I don't have to do unnecessary disk IO. I thought I can use tmpfs and "trick" the program to write to that, however not all Linux distros use tmpfs for /tmp, some mount tmpfs under /run (Ubuntu) others under /dev/shm (RedHat). I want my program to be as portable as possible and I don't want to create tmpfs file systems on the user's system if I can avoid it. Obviously I can do df | grep tmpfs and use whatever mount that returns, but I was hoping for something a bit more elegant. Is it possible to write to a pseudo terminal or maybe to /proc somewhere?

ventsyv
  • 3,316
  • 3
  • 27
  • 49
  • It's probably going to RAM anyway, due to Linux's disk cache. – Oliver Charlesworth Sep 03 '14 at 21:24
  • 1
    Since you've tagged this question `python`, how about `stringIO`? It lets you use a string variable as a file, basically. – kindall Sep 03 '14 at 21:28
  • Is this for a program you're writing, or one that's already been written that you can't modify? Your question is contradictory on the matter. –  Sep 03 '14 at 21:32
  • 1
    I'm writting a python program that's calling another program repeatedly. StringIO won't work, the program I'm calling from my script is taking the location of the file where to write, not an actual file handle. – ventsyv Sep 03 '14 at 21:38
  • PyFilesystem might be a good fit then. http://docs.pyfilesystem.org/en/latest/tempfs.html#module-fs.tempfs Basically you are trying to create a userspace ramdisk. I would still try to lookup a mounted tmpfs though. (e.g. mount | egrep -wi "tmpfs" | egrep -iv "/sys" | egrep "755" ) – Bjoern Rennhak Sep 03 '14 at 21:43
  • I looked at PyFilesystem as well, but as far as I understand it, i do have to create a filesystem and mount it, that's why I turned to mounted tmpfs. Also, I'm not sure if disk cache will keep the file in memory once the program that created the file terminates. – ventsyv Sep 03 '14 at 21:56
  • Will the output eventually go to the disk? Or will it be processed then dump on disk? Or will it be discarded entirely at some point? – damienfrancois Sep 03 '14 at 22:00
  • It will go to disk eventually. My script will call the program a few hundred times then read all of its output back in, concatenate it, then run a bunch of regex on it. – ventsyv Sep 03 '14 at 22:05
  • And the other program does not have an option to write to stdout? – damienfrancois Sep 03 '14 at 22:06
  • That was added later, but I do want to support the older versions and I don't want to have ugly if statements checking the version number all over my script. – ventsyv Sep 03 '14 at 22:11
  • Can you reverse who is calling who? See my suggested answer – damienfrancois Sep 03 '14 at 22:17

3 Answers3

4

Pass /proc/self/fd/1 as the filename to the child program. All of the writes to /proc/self/fd/1 will actually go to the child program's stdout. Use subprocess.Popen(), et al, to capture the child's stdout.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • That should work on all flavors of Linux and the user running the program will always have permissions to that file right? – ventsyv Sep 03 '14 at 23:01
  • Probably, and yes, respectively. – Robᵩ Sep 04 '14 at 02:39
  • @user2036161 - Here is a discussion of the portability of this technique: http://unix.stackexchange.com/questions/123602/portability-of-file-descriptor-links – Robᵩ Sep 04 '14 at 14:46
  • 1
    @ventsyv: here's a code example that demonstrates both named pipes and /dev/fd/N filenames for the same problem. [Multiple pipes in subprocess](https://stackoverflow.com/q/28840575/4279) – jfs Aug 08 '17 at 17:22
2

You could try named pipes if the child process accepts non-seekable files. The content of a named pipe doesn't touch disk.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • I ended up writing it as a threaded program - I would create the pipe, create a thread that calls the second program and in the main thread I would open the pipe, which unblocks it and that I get the data from the other process. Works pretty well. – ventsyv Oct 23 '14 at 20:34
1

You could split your Python script into two parts, the one repeatedly calling the other program, and the one merging the results, and transform the former into a Bash script so you can use the >() process substitution construct to pass a pseudo file to the other program that actually is the stdin of another process.

PoC:

Assume this is the other program:

$ cat otherprogram.py 
#/usr/bin/env python
import sys

with open(sys.argv[1], 'w') as file:
          file.write('Hello\n')

It takes a filename in argument and writes 'Hello' to it. Assume you need to call it five times. Then you can do something like this:

for i in {1..5}; do python otherprogram.py >(cat) ; done

that will output to stdout what otherprogram.py thinks it is writing to a file. You can then consume it with the other part of your Python script, like this:

$ cat consume.py 
#!/bin/env python

import fileinput

for line in fileinput.input():
        print "Processing line ", line

(this simply prepends something to the 'Hello')

$ { for i in {1..5}; do python otherprogram.py >(cat) ; done } | python consume.py
Processing line  Hello

Processing line  Hello

Processing line  Hello

Processing line  Hello

Processing line  Hello

So what otherprogram.py thinks it is writing to a file, it is actually sending to your program's stdin without hitting the disk thanks to Bash's process subsitution mechanism.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • I think that will do it! I probably don't even need to split it into a bash script, I can call it directly from python through a subprocess.Popen() call. I'll give it a try later tonight and I'll let you know how it works. – ventsyv Sep 03 '14 at 22:42
  • @user2036161: `>()` is probably implemented via `/proc/self/fd/` where available and using named pipes elsewhere – jfs Sep 03 '14 at 23:43