0

I'm quite new to node.js and I'm using it as a server to receive http.get requests from a browser, do some processing and return a result to the browser.

The processing is actually scraping a website using phantom.js/casper.js and because of the complications with using casper.js in node.js I'm executing it as a sub-process with the use of a shell script.

When I run the code below and make a request from the browser, the response seems to appear before the shell script has run. I tried using the async library to run the steps in series so that run.sh runs and returns a value before it's added to the response but it doesn't seem to work like that. Can anyone spot my stupid mistake?

Thanks in advance!

var express = require('express');
var app = express();
var sys = require('sys');
var exec = require('child_process').exec;
var async = require('async');
var auth = express.basicAuth('test', 'pass');
var server = app.listen(3000, function() {
    console.log('Listening on port %d', server.address().port);
});
var result;


//app.use(auth);

app.get('/free/:u/:p', auth, function(req, res) { 
    async.series([
        function(callback){
            var puts = function (error, stdout, stderr) {
                result = stdout;
             }
             exec("./run.sh " + req.params.u + " " + req.params.p, puts);
             callback(null);
        },
        function(callback){
             res.writeHead(200,{'Content-Type':'application/json'});
             res.send(result)
             callback(null);
        }
    ]);
});

app.get('/', function(req, res) {
  res.send('Hello!');
});

3 Answers3

0

You're not calling callback(null) asynchronously. You need to put it in puts.

Also, you shouldn't use a global results variable (it's even global across requests, possibly leaking data to different clients - arghh!). And there's hardly a reason to use async.js for this trivial chain.

app.get('/free/:u/:p', auth, function(req, res) { 
    exec("./run.sh " + req.params.u + " " + req.params.p, function (error, stdout, stderr) {
       res.writeHead(200,{'Content-Type':'application/json'});
       res.send(stdout);
    });
});

Notice the security issue that was introduced by not escaping shell parameters.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • perfect, my code really was poor compared to this! Now I have a different problem, it only runs if I comment out the res.writeHead part. It seems to be running send before the writeHead. This is the error I get: Error: Can't set headers after they are sent. Also - what is the security issue you mention - I'm a little lost.. – arbitrage_junkie Jun 02 '14 at 22:33
  • Hm, it shouldn't. Are there other `send()`s in the code? The vulnerability I'm talking about is a basic [command injection](http://www.blackhatlibrary.net/Command_Injection). Browse `/free/;/rm+-r+\.` (or better: dont!) Read [this question about escaping](http://stackoverflow.com/questions/1779858/how-do-i-escape-a-string-for-a-shell-command-in-nodejs-v8-javascript-engine) – Bergi Jun 02 '14 at 22:55
  • no other send() in the code other than in the '/' route code (as above).. I understand why this is a massive security issue now! Makes sense. I don't really follow the escaping solution though. Do I need to use some kind of regex to exclude anything which could be executed unintentionally? – arbitrage_junkie Jun 03 '14 at 13:16
  • No, you need to wrap your string arguments in quotes and use a regex to escape any string-delimiters. Or you simply use `spawn()` instead of `exec()`, which does take an array of arguments and escapes them appropriately. – Bergi Jun 03 '14 at 16:12
  • ok thanks. So from what I've read spawn sends back a stream of data rather than it being capped at the buffer size when using exec. I can't find any material on the automatic escaping of shell params. Also, I will have to capture the streams which leads me back to the original asynchronous problem... arrghh. Any chance you can re-write above for me using spawn instead please? – arbitrage_junkie Jun 04 '14 at 20:14
  • I think `spawn("sh", ["./run.sh", req.params.u, req.params.p])` would do it. Maybe change `sh` for `bash` (or whatever shell you're using). Then you need to manually collect the stream data [like in this answer](http://stackoverflow.com/a/1780120/1048572). – Bergi Jun 04 '14 at 20:23
0

Basically the problem you have is that you are invoking exec as if it was a synchronous function, when in fact, it is asynchronous.

So you should actually do:

exec("./run.sh " + req.params.u + " " + req.params.p, puts, function(){
   callback(null);
});

Or simply do it this way, since anyway you don't see to care about function parameters being passed by your functions in the async series.

exec("./run.sh " + req.params.u + " " + req.params.p, puts, callback)
Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205
0

The reason is that you run the shell command and then immediately invoke the callback which runs the next function (the response).

To fix it, You can change the puts function puts like this and remove the callback(null):

var express = require('express');
var app = express();
var sys = require('sys');
var exec = require('child_process').exec;
var async = require('async');
var auth = express.basicAuth('test', 'pass');
var server = app.listen(3000, function() {
    console.log('Listening on port %d', server.address().port);
});
var result;


//app.use(auth);

app.get('/free/:u/:p', auth, function(req, res) { 
    async.series([
        function(callback){
            var puts = function (error, stdout, stderr) {
                result = stdout;
                callback(null); //added here
             }
             exec("./run.sh " + req.params.u + " " + req.params.p, puts);
             //callback(null);  //commented
        },
        function(callback){
             res.writeHead(200,{'Content-Type':'application/json'});
             res.send(result)
             callback(null);  
        }
    ]);
});

app.get('/', function(req, res) {
  res.send('Hello!');
});

While this approach works, it's better to use async.waterfall instead of async.series. It allows you to pass the result of each function to the next one. So, you don't need to have a global result variable.

advncd
  • 3,787
  • 1
  • 25
  • 31