Script output is buffered into one message, despite separate echo statements?

Question

I have a shell script with three echo statements:

echo 'first message'

echo 'second message'

echo 'third message'

I then run this script in node and collect the output via this code:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});

But the singular output is "first message\nsecond message\nthird message\n", which is a problem. I expected three outputs, not one smushed together due to some form of buffering. And I can't just split on newlines, because the individual outputs may contain newlines.

Is there any way to distinguish the messages of individual echo statements? (or other output commands, i.e. printf, or anything that causes data to be written to stdout or stderror)

Edit: I have tried unbuffer and stdbuf, neither work for this, as simple testing can demonstrate. Here is an example of the stdbuf attempt, which I tried with a variety of different argument values, essentially all possible options.

 var child = process.spawn('stdbuf', ['-i0', '-o0', '-e0', './test.sh']);

To be clear, this problem happens when I run a python script from node, too, with just three simple print statements. So it's language-agnostic, it's not about bash scripting in particular. It's about successfully detecting the individual outputs of a script in any language on a unix-based system. If this is something C/C++ can do and I have to hook into that from node, I'm willing to go there. Any working solution is welcome.

Edit: I solved the problem for myself initially by piping the script's output to sed and using s/$/uniqueString to insert an identifier at the end of each individual output, then just splitting the received data on that identifier.

The answer I gave the bounty to will work on single-line outputs, but breaks on multi-line outputs. A mistake in my testing led me to think was not the case, but it is. The accepted answer is the better solution and will work on outputs of any size. But if you can't control the script and have to handle user-created scripts, then my sed solution is the only thing I've found that works. And it does work, quite well.

You know the easiest solution is you can just add `sleep 1` between each echo statement and it works. — adamrights, May 16 '19 at 22:53
It may technically allow you to detect separate echo statements, but it comes at the cost of arbitrarily slowing your program down, potentially by orders of magnitude. It's a terrible solution and strongly recommended against in the presence of many superior alternatives. — temporary_user_name, May 17 '19 at 05:39

score 5 · Answer 1 · edited May 17 '19 at 05:42

5

You can use the readline interface provided as part of the node APIs. More information here https://nodejs.org/api/readline.html#readline_event_line. You will use spawn as it is however pass the stdout to readline so that it can parse the lines. Not sure if this is what you intend to do. Here is some sample code:

var process = require('child_process');
const readline = require('readline');

var child = process.spawn('./test.sh');

// Use readline interface
const readlinebyline = readline.createInterface({ input: child.stdout });

// Called when a line is received
readlinebyline.on('line', (line) => {
    line = JSON.stringify(line.toString('utf8'));
    console.log(line);
});

Output:

"first message"
"second message"
"third message"

If you get an error like TypeError: input.on is not a function, make sure you have executing privileges on the test.sh script via chmod +x test.sh.

edited May 17 '19 at 05:42

temporary_user_name

35,956
47
141
220

answered May 10 '19 at 14:57

manishg

9,520
1
16
19

I actually get `TypeError: input.on is not a function` from `readline.js:189`. Not sure what to make of that. – temporary_user_name May 10 '19 at 18:20
what is your `node` version? – manishg May 10 '19 at 18:21
`10.15.2`. I'm reading various StackOverflow questions trying to see if there's a solution to that error, like [this one](https://stackoverflow.com/questions/43473537/node-readline-module-doesnt-have-on-function). – temporary_user_name May 10 '19 at 18:22
Did you use my example as it is or did you make some modifications? – manishg May 10 '19 at 18:23
No modifications. – temporary_user_name May 10 '19 at 18:24
I tried using `10.15.2` on mac with the above example and it seems to work for me – manishg May 10 '19 at 18:30
@temporary_user_name Can you try `chmod +x test.sh` first in your shell to see if you have permissions? – manishg May 10 '19 at 18:38
@temporary_user_name Did you try what I mentioned above? – manishg May 14 '19 at 20:48
Oddly, when I test again I get different results. I'm not sure why, but now `readline` is behaving as expected and reading one line at a time, so a single `echo` statement with multiple lines of output fires multiple times instead of once. – temporary_user_name May 17 '19 at 06:01
Perhaps the `stream` module would be better, or something like that. Web servers receive individual writes to sockets....surely there's a way to do this with one of the various node interfaces. – temporary_user_name May 17 '19 at 06:07

score 1 · Answer 2 · answered May 12 '19 at 01:27

The C library that underlies bash and python is the one that does per-line buffering of stdout. stdbuf and unbuffer would deal with that, but not the buffering done by the operating system.

Linux, for example, allocates 4096 bytes as the buffer for the pipe between your node.js process and the bash process.

Truth is, there's no honest way for a process on one end of the pipe (node.js) to see individual writes (echo calls) on the other end. This isn't the right design (you could communicate via individual files instead of stdout).

If you insist, you can try and fool the OS scheduler: if nothing is even remotely close to writing to the pipe, then it will schedule-in the reader process (node.js) which will read what's currently in the OS buffer.

I tested this on Linux:

$ cat test.sh 
echo 'first message'
sleep 0.1
echo 'second message'
sleep 0.1
echo 'third message'
$ cat test.js 
const  child_process  = require('child_process');
var child = child_process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   global.process.stdout.write(data); // notice global object
});
$ node test.js
"first message\n""second message\n""third message\n"

Even I thought of this solution, but I thought making changes to external files is not a good approach. — PrivateOmega, May 14 '19 at 06:51
Can you elaborate a little on "This isn't the right design (you could communicate via individual files instead of stdout)." ? I'm intrigued by it, but adding `sleep` statements to the script is definitely not the right solution here. — temporary_user_name, May 17 '19 at 05:48
@temporary_user_name "isn't the right design" the writer (echo) needs to frame individual messages - which is what the accepted solution does. — root, May 20 '19 at 02:11

score 1 · Accepted Answer · answered May 16 '19 at 02:25

I ran into the same problem on a previous project. I used the interpretation switch on the echo statement and then split the string on a non-printable character.

Example:

echo -e 'one\u0016'

echo -e "two\u0016"

echo -e 'three\u0016'

Result:

"one\u0016\ntwo\u0016\nthree\u0016\n"

And the corresponding Javascript:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   var value = data.toString('utf8');
   var values = value.split("\u0016\n").filter(item => item);
   console.log(values);
});

Yep, same principle I used, except I did it with sed. – temporary_user_name May 16 '19 at 02:30 — temporary_user_name, May 16 '19 at 02:30

score 1 · Answer 4 · answered May 16 '19 at 22:32

1

If you expect the output from test.sh to be always sent by line then IMHO your best choice is to use readline

const readline = require('readline');
const {spawn} = require('child_process');

const child = spawn('./test.sh');
const rl = readline.createInterface({
    input: child.stdout
});

rl.on('line', (input) => {
    console.log(`Received: ${input}`);
});

answered May 16 '19 at 22:32

maurycy

8,455
1
27
44

Correct solution, and thank you, but the bounty winner submitted his answer earlier. Upvoted though! `¯\_(ツ)_/¯` – temporary_user_name May 17 '19 at 05:55

calbertts · Answer 5 · 2019-05-11T11:29:53.520

0

Do not use console.log:

const  process_module  = require('child_process');

var child = process_module.spawn('./test.sh');
child.stdout.on('data', data => {
   process.stdout.write(data);
});

UPDATE (just to show the difference between process module and process global object):

const process = require('child_process');

var child = process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   global.process.stdout.write(data); // notice global object
});

The files I've used to test this script are:

Python:

#!/usr/bin/env python

print("first message")
print("second message")
print("third message")

Bash:

#!/usr/bin/env bash

echo 'first message'
echo 'second message'
echo 'third message'

The output:

first message
second message
third message

Make sure, they are executable scripts with:

chmod a+x test.sh
chmod a+x test.py

edited May 11 '19 at 11:29

answered May 10 '19 at 16:10

calbertts

1,513
2
15
34

Did you mean `process_module.stdout.write(data)`? – temporary_user_name May 10 '19 at 18:15
Either way, `stdout` is `undefined` for me on the `child` process. – temporary_user_name May 10 '19 at 18:16
No, you read wrong, I'm using the global `process` variable for writing: `process.stdout.write(data)`. The `process_module` is different, that's why I named differently. I've updated the answer to show you the difference. – calbertts May 11 '19 at 11:23
Sorry, changing `process.stdout.write(data);` to `process.stdout.write("message: " + data);` makes it clear that it is not firing once for each `echo` statement. – temporary_user_name May 17 '19 at 05:52
It's because of the message size, the `data` event is not supposed to be called per every script sentence, but blocks of data. – calbertts May 17 '19 at 08:04

score 0 · Answer 6 · answered May 16 '19 at 22:51

There is a very simple solution to this. Simply add a sleep 1 to your bash script and the .on('data') handler won't combine the outputs.

So script like this:

#/bin/bash
echo 'first message'
sleep 1
echo 'second message'
sleep 1
echo 'third message'

And your exact script (with fix to missing require('child_process');

var process = require('child_process');
var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});

GGets · Answer 7 · 2019-05-29T14:54:26.767

If you're trying to split-interpret each message, this might help: (I don't have much experience with node, sorry if I got something wrong)

test.sh:

#!/bin/bash
echo -n 'first message'
echo -ne '\0'
echo -n 'second message'
echo -ne '\0'
echo -n 'third message'
echo -ne '\0'

node:

var child = process.spawn('./test.sh');
var data_buffer  = Buffer.from([]);
var data_array   = [];
child.stdout.on('data', data => {
  data_buffer   += data;
  while (data_buffer.includes("\0")) {
    let i        = data_buffer.indexOf("\0");
    let s        = data_buffer.slice(0,i);
    data_array.push(s);
    data_buffer  = data_buffer.slice(i+1);
    let json     = JSON.stringify(s.toString('utf8'));
    console.log('--8<-------- split ------------');
    console.log('index: '+i);
    console.log('received: '+s);
    console.log('json: '+json);
    console.log(data_array);
  }
});

This would essentially use NULL-delimited strings instead of newline-delimited. Another option would be to utilize the use of IFS, but i failed at achieving this. This method will save you from the need to use readline.

One thing to note is that you would have to store all the received data in a global variable since you can't control how the chunks of data arrive (I don't know if there's a way to control that). having said that you can reduce the size of it by cutting the already interpreted part of it, hence the second slice.

For this to work, of course you have to make sure you don't have any null characters in your data. But you can change the delimiting character if you do.

This approach, I think is more thorough IMHO.

If you needed python3:

#!/usr/bin/python3
print("first message", end = '\x00')
print("second message", end = '\x00')
print("third message", end = '\x00')

Script output is buffered into one message, despite separate echo statements?

7 Answers7