1

I'm having trouble create processes in parallel with Node while exiting when they're done with a simple HTTP GET request. I've noticed that if I fire a process.exit() inside of a callback for appendFile, some files will not be created or appended in a Node cluster setup. Ideally, the way below is how I would like to fire events since the process is exited as soon as the job is done:

var rp = require("request-promise");
    config = require("./config"),
    cluster = require("cluster"),
    os = require("os"),
    fs = require("fs");

var keywordArray = [
    'keyword1',
    'keyword2',
    ...
];

if (cluster.isMaster) {

    var numCPUs = os.cpus().length;
    var clusterDivision = Math.ceil(keywordArray.length/numCPUs);

    // Reset the json if previously set
    keywordArray.forEach(function(arrayItem) {
        fs.unlink(config.dataDirectory + arrayItem + '.json', function(err) {
            if (err) console.error(err);
            console.log('successfully unlinked ' + arrayItem + '.json from ' + config.dataDirectory);
        });
    });

    // Create a worker for each CPU
    // Seperate the array out evenly for each worker
    for (var j=1;j<=numCPUs;j++) {
        var tempArray = [];
        var removed = keywordArray.splice(0, clusterDivision);
        if (removed.length > 0) {
            // The array contains something so let's do something with the keyword
            console.log('creating a worker');
            cluster.fork().send(removed);
        } else {
            // We don't need a cluster here
        }
    }

    process.on('exit', function() {
        console.log('exited');
    });

} else if (cluster.isWorker) {
    //  Code to run if we're in a worker process

    // Send the object we created above from variables so they're available to the workers
    process.on('message', function(seperatedArrayItem) {

        seperatedArrayItem.forEach(function(arrayItem) {
            function radarRequest(err, response, body) {
                var responseBody = JSON.parse(body);
                console.log(arrayItem); 
                fs.appendFileSync(config.dataDirectory + arrayItem + '.json', JSON.stringify(responseBody.results, null, '\t'), function (err) {
                    if (err) console.err(err);
                    console.log('success writing file');
                });
            }

            rp({
                url: config.radarSearchURI + 
                '?key='+ config.apiKey + 
                '&location=' + config.latitude + ',' + config.longitude + 
                '&radius=' + config.searchRadius + 
                '&keyword=' + arrayItem, headers: config.headers
            }, radarRequest);
        });

        setTimeout(function() {
            process.exit(0);
        }, 5000);
    });
}

The only way I can make sure all files are properly appended is by using a Timeout, which is exactly what I don't want to - and shouldn't - do. Is there another way I can ensure an appendFile has happened successfully and then kill the node process? Here's a way that works (assuming the process doesn't take longer than 5 seconds):

    process.on('message', function(seperatedArrayItem) {

    seperatedArrayItem.forEach(function(arrayItem) {
        function radarRequest(err, response, body) {
            var responseBody = JSON.parse(body);
            console.log(arrayItem); 
            fs.appendFile(config.dataDirectory + arrayItem + '.json', JSON.stringify(responseBody.results, null, '\t'), function (err) {
                if (err) console.err(err)
                console.log('success writing file');
            });
        }

        rp({
            url: config.radarSearchURI + 
            '?key='+ config.apiKey + 
            '&location=' + config.latitude + ',' + config.longitude + 
            '&radius=' + config.searchRadius + 
            '&keyword=' + arrayItem, headers: config.headers
        }, radarRequest);
    });

    setTimeout(function() {
        process.exit(0);
    }, 5000);
});
  • Checkout if [writing with streams help](http://stackoverflow.com/questions/2496710/writing-files-in-node-js/6958773#6958773) or writeSync – laggingreflex Feb 09 '15 at 19:21
  • Why is calling `process.exit` necessary in this case? It should exit when all of the child processes are closed. – loganfsmyth Feb 09 '15 at 19:25
  • They never all exit, as I split process up with the cluster. I'll want to do more clustered actions after I get these so I want to make sure they're closed before creating new ones. Is that no how I should do it? –  Feb 09 '15 at 19:27
  • @laggingreflex write streams will also perform with the same result. They work perfectly however I really want to close the process once this is complete as I intend on performing more actions once I have these files created. –  Feb 09 '15 at 20:06
  • Checkout http://stackoverflow.com/questions/28426394/why-does-this-node-script-exit-in-node-0-12-but-not-in-0-10 – laggingreflex Feb 10 '15 at 09:22

2 Answers2

2

You can use an async flow control module like async to kill the process after all files are written. I'd also recomment cluster.worker.disconnect() so that the node process will simple exit gracefully, but that isn't a requirement.

async.forEach(seperatedArrayItem, function(item, done){
    // append file and call 'done' when it is written.

}, function(){
    // Will be called when all item 'done' functions have been called.
    cluster.worker.disconnect();
});
loganfsmyth
  • 156,129
  • 30
  • 331
  • 251
  • Thanks so much! This was really bogging me down the last few days. The key was tying into async's `done()`; thanks for the overview on it. Can you expand more on why you recommend `cluster.order.disconnect()` over `process.exit(0)`? –  Feb 09 '15 at 21:06
  • 1
    Node processes automatically exit when they have nothing left to do in their event loop, so `.disconnect` tells the worker to stop talking to the master process, and with that disconnected, the process has nothing else keeping it alive. – loganfsmyth Feb 09 '15 at 22:03
2

Node fs.appendFile( ... ) is an asynchronous function. So it expects us to pass a callback for we know it has finished its main operation, to inform us of some error occurred, or another purpose.

This means we need to call Node process.exit( ... ) in the scope of the provided callback. I've written this code to test:

'use strict';

var fs = require('fs');

function jsonValue(obj) {
    return JSON.stringify(obj, null, '\t');
}

fs.appendFile('file.json', jsonValue(['t', 'e', 's', 't']), function(error) {
    if (error) {
        throw error;
    }

    console.log('success writing file');  // no error, so log...
    process.exit();                       // and exit right now
    console.log('exited?');               // this will not be printed
});

Well, it worked as defined.

Other way it works is to use the synchronous version of fs.appendFile( ... ) and call process.exit() in a sequential way:

fs.appendFileSync('file.json', jsonValue(['t', 'e', 's', 't']));

console.log('success writing file'); // no error (I hope so =), so log...
process.exit(); // and exit right now
console.log('exited?'); // this will not be printed

This is clean code and works, but you lose the robustness and convenience gained with the callback...

ranieribt
  • 1,280
  • 1
  • 15
  • 33