1

I'm controlling long running simulations (hours, days, even weeks) using a bash script that iterates over all wanted parameters. If only one simulation runs concurrently, the output is piped to "tee", else the output is plainly piped ">" to an output file. All output are huge: some log files are ~2GB and could be even bigger.

The script is working, but is a hell to maintain. When we add a new parameter it takes some time to adapt the script and all the sed-foo in it. So I've ported it to Python. It's working GREAT.

The only problem I have now preventing me from using it in production is that I can't find the right way of calling Popen() to launch the program. If I run it "silent" by piping everything to the file and not showing any output, python takes gigabytes of ram before the simulation is done.

Here's the code snipet:

fh = open(logfile, "w")
pid = subprocess.Popen(shlex.split(command), stdout=fh)
pids.append(pid)

I've read a lot of stuff about Popen the output, but I though that piping it to a file would flush the buffer when needed?

Maybe subprocess' Popen() is not the best for this? What's the best way to show and save a program's output to screen and file without taking all the ram?

Thanx!

big_gie
  • 2,829
  • 3
  • 31
  • 45
  • Just to clarify - you really want to pipe gigabytes of output to the console in addition to a file? – Greg Hewgill Aug 03 '10 at 13:28
  • `I'm controlling long running simulations (hours, days, even weeks)` Design checkpointing into your application. http://en.wikipedia.org/wiki/Application_checkpointing – MattH Aug 03 '10 at 13:59
  • Yes, but it's not piped all at once: the simulation can run over a week. That is why I want to save it: I want a fine grain print of what is actually happening, but since there is no way the terminal can buffer that amount of data, I want to save it to a file too. Normally the output to screen shouldn't be that big, but sometimes it does. I don't want the machine to call the OOM killer just because the output buffer is not flushed... Better used something that is not prone to this. – big_gie Aug 03 '10 at 13:59
  • @MattH Thanx, I have checkpointing already too. That's not an issue here. I want to reproduce bash's "tee" functionality: save and output at the same time, without having to buffer the whole content so python does not end up out of memory. – big_gie Aug 03 '10 at 14:01
  • @big_gie: How many subprocesses is your job going to have open? – MattH Aug 03 '10 at 15:07
  • @MattH A couple: I should not need more then 16. – big_gie Aug 03 '10 at 16:02

4 Answers4

2

Why not write silently to a file and then tail it?

You can use file.flush() to clear Python's file buffer.


Python will happily handle new lines in a currently-open file. For instance:

f = open( "spam.txt", "r" )
f.read()
# 'I like ham!'
# Now open up spam.txt in some other program and add a new line.
f.read()
# 'I like eggs too!'
Community
  • 1
  • 1
Katriel
  • 120,462
  • 19
  • 136
  • 170
  • Tail woudn't be aware of any new lines. Logic for that would need to be added and that should be avoided. It wouldn't solve the problem either: it seemed python was taking 2GB of ram when only outputting to a file (not even printing to screen). – big_gie Aug 03 '10 at 14:23
  • The link is to a Python implementation of `tail`; it would work. See new code. – Katriel Aug 03 '10 at 14:38
  • And flushing the file buffer regularly should stop Python using huge amounts of memory. – Katriel Aug 03 '10 at 14:39
0

Instead of splitting the output of your simulation, choose to pipe it into a file (or write to a file from within the simulation, then use tail -f to watch the latest output in a console.

Example simulation:

#!/bin/bash

while true; do
  echo $$ -- $(date +%s)
  sleep 1
done

Or perhaps:

#!/usr/bin/env python
import os, sys, time

while True:
  sys.stdout.write("%d -- %d\n"%(os.getpid(), time.time()) )
  sys.stdout.flush()
  time.sleep(1)

Invocation:

$ nohup ./simulation &> logfile &

Watching the output:

$ tail -f logfile
1285 -- 1337166243
1285 -- 1337166244
1285 -- 1337166245
1285 -- 1337166246
1285 -- 1337166247
^C

Notes:

  • Bonus points for splitting stderr and stdout to different logfiles.
  • Don't use tee for things like this. It's fragile and will propagate errors to your simulation in case something bad happens at the pipe's end.
  • Note how we record the PID of the simulation so that we can abort it if we want after it has been started. it's recommended that you store this in a pidfile instead of the simulation log, purely for simplicity when killing your simulation.
  • Use nohup. This will protect your simulation run in case you close the originating terminal, or if X11 crashes (from experience, this will happen when your 4 day simulation is 98% complete, and you haven't implemented checkpoints...).
brice
  • 24,329
  • 7
  • 79
  • 95
0

The simplest solution was to change the code so it outputs to stdout AND a log file. Then, the output does not neet to be saved using tee or a pipe.

pipe_verbose = sys.stdout
pipe_silent  = open('/dev/null', 'w')

subprocess.Popen(shlex.split(command), stdout=pipe_silent)
subprocess.Popen(shlex.split(command), stdout=pipe_verbose)

and finally I poll() to see when done.

Piping has the nice result of that if I ctrl+c the script, it kills the job too. If I did not put stdout=... in the Popen(), then the job continues in the background. Also, python's CPU usage stays at 0% that way. A readline loop on a pipe would raise it to 100%...

big_gie
  • 2,829
  • 3
  • 31
  • 45
  • He most definitely running the same command twice, in parallel. – ddotsenko Aug 04 '10 at 10:57
  • Actually, all this in enclosed in a function. I just pasted the relevant lines about how the subprocess are being launched. I wanted to emphasis the stdout= option and its pipe. – big_gie Aug 05 '10 at 04:18
  • Sorry big_pie, but running the same command twice, especially for a long-running (and computationally expensive) simulation is a terrible solution. -1 – brice May 16 '12 at 11:13
0

If output has reliably-occurring output delimiters (markers indicating end of output section) consider doing the "bad" thing and reading the stdout chunks from subprocess in a separate thread and writing individual chunks to log, flashing them with every write.

Take a look here for some examples of non-blocking reads from subprocess' pipe:

How can I read all availably data from subprocess.Popen.stdout (non blocking)?

Community
  • 1
  • 1
ddotsenko
  • 4,926
  • 25
  • 24