8

I'm trying to read the output of a Python script launched by Node.js as it arrives. However, I only get access to the data once the process has finished.

var proc, args;

args = [
    './bin/build_map.py',
    '--min_lon',
    opts.sw.lng,
    '--max_lon',
    opts.ne.lng,
    '--min_lat',
    opts.sw.lat,
    '--max_lat',
    opts.ne.lat,
    '--city',
    opts.city
];

proc = spawn('python', args);

proc.stdout.on('data', function (buf) {
    console.log(buf.toString());
    socket.emit('map-creation-response', buf.toString());
});

If I launch the process with { stdio : 'inherit' } I can see the output as it happens directly in the console. But doing something like process.stdout.on('data', ...) will not work.

How do I make sure I can read the output from the child process as it arrives and direct it somewhere else?

Oscar
  • 171
  • 3
  • 6
  • 1
    This buffering occurs in the process you start. Nothing you can do about it in node, you have to tackle it in the Python program. The canonical [is here](https://stackoverflow.com/questions/107705/disable-output-buffering). – Hans Passant May 08 '19 at 15:57

2 Answers2

3

The process doing the buffering, because it knows the terminal was redirected and not really going to the terminal, is python. You can easily tell Python not to do this buffering: Just run "python -u" instead of "python". Should be easy as that.

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
0

When a process is spawned by child_process.spawn(), the streams connected to the child process's standard output and standard error are actually unbuffered on the Nodejs side. To illustrate this, consider the following program:

const spawn = require('child_process').spawn;

var proc = spawn('bash', [
  '-c',
  'for i in $(seq 1 80); do echo -n .; sleep 1; done'
]);

proc.stdout
.on('data', function (b) {
  process.stdout.write(b);
})
.on('close', function () {
  process.stdout.write("\n");
});

This program runs bash and has it emit . characters every second for 80 seconds, while consuming this child process's standard output via data events. You should notice that the dots are emitted by the Node program every second, helping to confirm that buffering does not occur on the Nodejs side.

Also, as explained in the Nodejs documentation on child_process:

By default, pipes for stdin, stdout and stderr are established between the parent Node.js process and the spawned child. It is possible to stream data through these pipes in a non-blocking way. Note, however, that some programs use line-buffered I/O internally. While that does not affect Node.js, it can mean that data sent to the child process may not be immediately consumed.

You may want to confirm that your Python program does not buffer its output. If you feel you're emitting data from your Python program as separate distinct writes to standard output, consider running sys.stdout.flush() following each write to suggest that Python should actually write data instead of trying to buffer it.

Update: In this commit that passage from the Nodejs documentation was removed for the following reason:

doc: remove confusing note about child process stdio

It’s not obvious what the paragraph is supposed to say. In particular, whether and what kind of buffering mechanism a process uses for its stdio streams does not affect that, in general, no guarantees can be made about when it consumes data that was sent to it.

This suggests that there could be buffering at play before the Nodejs process receives data. In spite of this, care should be taken to ensure that processes within your control upstream of Nodejs are not buffering their output.

ctt
  • 1,405
  • 8
  • 18
  • Take out the `sleep` statement and see what happens. It is buffered, but the buffer is less than a second. – temporary_user_name May 08 '19 at 04:23
  • The buffers at play here have a **size** that is measured in bytes. I'm not aware of any buffers at play here that ensure a delay (e.g. *a buffer of less than a second*). The example demonstrates that the buffer on the `stdout` stream is less than one byte (as data events are emitted as data is produced). If you feel that there is a **delay** at play, it may be worth profiling the Nodejs application to determine where time is spent. Reading data via a Stream a byte at a time will carry considerable overhead, which could explain the delays. This is why buffering is often beneficial. – ctt May 08 '19 at 04:45
  • Perhaps I misspoke, but either way if you take out the `sleep` statements it will buffer. I mean, if you're right, then taking out the sleep statements shouldn't matter. – temporary_user_name May 08 '19 at 04:47
  • What exactly do you observe to suggest that buffering is taking place? How is ur definition of buffering measured? Do you feel that the output is buffered because it appears as though all of the output appears on the screen at the same time a split second after hitting **enter** to invoke `node`? Or are you using a more objective measure? – ctt May 08 '19 at 04:51
  • @temporary_user_name just to make things more clear, I've included a passage from the documentation on `child_process`. This said, though, there may be delays caused by the overhead related to processing data, unbuffered, using the Streams API. – ctt May 08 '19 at 04:55
  • @temporary_user_name I made yet another update to explain a few more things. The answer is more accurate, but does suggest that it's not **impossible** for buffering or delays to exist in addition to any buffering/delays introduced by Python. It's still worth ensuring that Python is flushing its output as often as possible. – ctt May 08 '19 at 05:15