9

I have a shell script with three echo statements:

echo 'first message'

echo 'second message'

echo 'third message'

I then run this script in node and collect the output via this code:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});

But the singular output is "first message\nsecond message\nthird message\n", which is a problem. I expected three outputs, not one smushed together due to some form of buffering. And I can't just split on newlines, because the individual outputs may contain newlines.

Is there any way to distinguish the messages of individual echo statements? (or other output commands, i.e. printf, or anything that causes data to be written to stdout or stderror)

Edit: I have tried unbuffer and stdbuf, neither work for this, as simple testing can demonstrate. Here is an example of the stdbuf attempt, which I tried with a variety of different argument values, essentially all possible options.

 var child = process.spawn('stdbuf', ['-i0', '-o0', '-e0', './test.sh']);

To be clear, this problem happens when I run a python script from node, too, with just three simple print statements. So it's language-agnostic, it's not about bash scripting in particular. It's about successfully detecting the individual outputs of a script in any language on a unix-based system. If this is something C/C++ can do and I have to hook into that from node, I'm willing to go there. Any working solution is welcome.


Edit: I solved the problem for myself initially by piping the script's output to sed and using s/$/uniqueString to insert an identifier at the end of each individual output, then just splitting the received data on that identifier.

The answer I gave the bounty to will work on single-line outputs, but breaks on multi-line outputs. A mistake in my testing led me to think was not the case, but it is. The accepted answer is the better solution and will work on outputs of any size. But if you can't control the script and have to handle user-created scripts, then my sed solution is the only thing I've found that works. And it does work, quite well.

temporary_user_name
  • 35,956
  • 47
  • 141
  • 220
  • You know the easiest solution is you can just add `sleep 1` between each echo statement and it works. – adamrights May 16 '19 at 22:53
  • It may technically allow you to detect separate echo statements, but it comes at the cost of arbitrarily slowing your program down, potentially by orders of magnitude. It's a terrible solution and strongly recommended against in the presence of many superior alternatives. – temporary_user_name May 17 '19 at 05:39

7 Answers7

5

You can use the readline interface provided as part of the node APIs. More information here https://nodejs.org/api/readline.html#readline_event_line. You will use spawn as it is however pass the stdout to readline so that it can parse the lines. Not sure if this is what you intend to do. Here is some sample code:

var process = require('child_process');
const readline = require('readline');

var child = process.spawn('./test.sh');

// Use readline interface
const readlinebyline = readline.createInterface({ input: child.stdout });

// Called when a line is received
readlinebyline.on('line', (line) => {
    line = JSON.stringify(line.toString('utf8'));
    console.log(line);
});

Output:

"first message"
"second message"
"third message"

If you get an error like TypeError: input.on is not a function, make sure you have executing privileges on the test.sh script via chmod +x test.sh.

temporary_user_name
  • 35,956
  • 47
  • 141
  • 220
manishg
  • 9,520
  • 1
  • 16
  • 19
1

The C library that underlies bash and python is the one that does per-line buffering of stdout. stdbuf and unbuffer would deal with that, but not the buffering done by the operating system.

Linux, for example, allocates 4096 bytes as the buffer for the pipe between your node.js process and the bash process.

Truth is, there's no honest way for a process on one end of the pipe (node.js) to see individual writes (echo calls) on the other end. This isn't the right design (you could communicate via individual files instead of stdout).

If you insist, you can try and fool the OS scheduler: if nothing is even remotely close to writing to the pipe, then it will schedule-in the reader process (node.js) which will read what's currently in the OS buffer.

I tested this on Linux:

$ cat test.sh 
echo 'first message'
sleep 0.1
echo 'second message'
sleep 0.1
echo 'third message'
$ cat test.js 
const  child_process  = require('child_process');
var child = child_process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   global.process.stdout.write(data); // notice global object
});
$ node test.js
"first message\n""second message\n""third message\n"
root
  • 5,528
  • 1
  • 7
  • 15
  • Even I thought of this solution, but I thought making changes to external files is not a good approach. – PrivateOmega May 14 '19 at 06:51
  • Can you elaborate a little on "This isn't the right design (you could communicate via individual files instead of stdout)." ? I'm intrigued by it, but adding `sleep` statements to the script is definitely not the right solution here. – temporary_user_name May 17 '19 at 05:48
  • @temporary_user_name "isn't the right design" the writer (echo) needs to frame individual messages - which is what the accepted solution does. – root May 20 '19 at 02:11
1

I ran into the same problem on a previous project. I used the interpretation switch on the echo statement and then split the string on a non-printable character.

Example:

echo -e 'one\u0016'

echo -e "two\u0016"

echo -e 'three\u0016'

Result:

"one\u0016\ntwo\u0016\nthree\u0016\n"

And the corresponding Javascript:

var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   var value = data.toString('utf8');
   var values = value.split("\u0016\n").filter(item => item);
   console.log(values);
});
itsben
  • 1,017
  • 1
  • 6
  • 11
1

If you expect the output from test.sh to be always sent by line then IMHO your best choice is to use readline

const readline = require('readline');
const {spawn} = require('child_process');

const child = spawn('./test.sh');
const rl = readline.createInterface({
    input: child.stdout
});

rl.on('line', (input) => {
    console.log(`Received: ${input}`);
});
maurycy
  • 8,455
  • 1
  • 27
  • 44
0

Do not use console.log:

const  process_module  = require('child_process');

var child = process_module.spawn('./test.sh');
child.stdout.on('data', data => {
   process.stdout.write(data);
});

UPDATE (just to show the difference between process module and process global object):

const process = require('child_process');

var child = process.spawn(`./test.sh`);
child.stdout.on('data', data => {
   global.process.stdout.write(data); // notice global object
});

The files I've used to test this script are:

Python:

#!/usr/bin/env python

print("first message")
print("second message")
print("third message")

Bash:

#!/usr/bin/env bash

echo 'first message'
echo 'second message'
echo 'third message'

The output:

first message
second message
third message

Make sure, they are executable scripts with:

chmod a+x test.sh
chmod a+x test.py
calbertts
  • 1,513
  • 2
  • 15
  • 34
0

There is a very simple solution to this. Simply add a sleep 1 to your bash script and the .on('data') handler won't combine the outputs.

So script like this:

#/bin/bash
echo 'first message'
sleep 1
echo 'second message'
sleep 1
echo 'third message'

And your exact script (with fix to missing require('child_process');

var process = require('child_process');
var child = process.spawn('./test.sh');
child.stdout.on('data', data => {
   data = JSON.stringify(data.toString('utf8'));
   console.log(data);
});
adamrights
  • 1,701
  • 1
  • 11
  • 27
0

If you're trying to split-interpret each message, this might help: (I don't have much experience with node, sorry if I got something wrong)

test.sh:

#!/bin/bash
echo -n 'first message'
echo -ne '\0'
echo -n 'second message'
echo -ne '\0'
echo -n 'third message'
echo -ne '\0'

node:

var child = process.spawn('./test.sh');
var data_buffer  = Buffer.from([]);
var data_array   = [];
child.stdout.on('data', data => {
  data_buffer   += data;
  while (data_buffer.includes("\0")) {
    let i        = data_buffer.indexOf("\0");
    let s        = data_buffer.slice(0,i);
    data_array.push(s);
    data_buffer  = data_buffer.slice(i+1);
    let json     = JSON.stringify(s.toString('utf8'));
    console.log('--8<-------- split ------------');
    console.log('index: '+i);
    console.log('received: '+s);
    console.log('json: '+json);
    console.log(data_array);
  }
});

This would essentially use NULL-delimited strings instead of newline-delimited. Another option would be to utilize the use of IFS, but i failed at achieving this. This method will save you from the need to use readline.

One thing to note is that you would have to store all the received data in a global variable since you can't control how the chunks of data arrive (I don't know if there's a way to control that). having said that you can reduce the size of it by cutting the already interpreted part of it, hence the second slice.

For this to work, of course you have to make sure you don't have any null characters in your data. But you can change the delimiting character if you do.

This approach, I think is more thorough IMHO.

If you needed python3:

#!/usr/bin/python3
print("first message", end = '\x00')
print("second message", end = '\x00')
print("third message", end = '\x00')
GGets
  • 416
  • 6
  • 19