187

From my experience, a php server would throw an exception to the log or to the server end, but node.js just simply crashes. Surrounding my code with a try-catch doesn't work either since everything is done asynchronously. I would like to know what does everyone else do in their production servers.

TiansHUo
  • 8,509
  • 7
  • 45
  • 57

9 Answers9

155

PM2

First of all, I would highly recommend installing PM2 for Node.js. PM2 is really great at handling crash and monitoring Node apps as well as load balancing. PM2 immediately starts the Node app whenever it crashes, stops for any reason or even when server restarts. So, if someday even after managing our code, app crashes, PM2 can restart it immediately. For more info, Installing and Running PM2

Other answers are really insane as you can read at Node's own documents at http://nodejs.org/docs/latest/api/process.html#process_event_uncaughtexception

If someone is using other stated answers read Node Docs:

Note that uncaughtException is a very crude mechanism for exception handling and may be removed in the future

Now coming back to our solution to preventing the app itself from crashing.

So after going through I finally came up with what Node document itself suggests:

Don't use uncaughtException, use domains with cluster instead. If you do use uncaughtException, restart your application after every unhandled exception!

DOMAIN with Cluster

What we actually do is send an error response to the request that triggered the error, while letting the others finish in their normal time, and stop listening for new requests in that worker.

In this way, domain usage goes hand-in-hand with the cluster module, since the master process can fork a new worker when a worker encounters an error. See the code below to understand what I mean

By using Domain, and the resilience of separating our program into multiple worker processes using Cluster, we can react more appropriately, and handle errors with much greater safety.

var cluster = require('cluster');
var PORT = +process.env.PORT || 1337;

if(cluster.isMaster) 
{
   cluster.fork();
   cluster.fork();

   cluster.on('disconnect', function(worker) 
   {
       console.error('disconnect!');
       cluster.fork();
   });
} 
else 
{
    var domain = require('domain');
    var server = require('http').createServer(function(req, res) 
    {
        var d = domain.create();
        d.on('error', function(er) 
        {
            //something unexpected occurred
            console.error('error', er.stack);
            try 
            {
               //make sure we close down within 30 seconds
               var killtimer = setTimeout(function() 
               {
                   process.exit(1);
               }, 30000);
               // But don't keep the process open just for that!
               killtimer.unref();
               //stop taking new requests.
               server.close();
               //Let the master know we're dead.  This will trigger a
               //'disconnect' in the cluster master, and then it will fork
               //a new worker.
               cluster.worker.disconnect();

               //send an error to the request that triggered the problem
               res.statusCode = 500;
               res.setHeader('content-type', 'text/plain');
               res.end('Oops, there was a problem!\n');
           } 
           catch (er2) 
           {
              //oh well, not much we can do at this point.
              console.error('Error sending 500!', er2.stack);
           }
       });
    //Because req and res were created before this domain existed,
    //we need to explicitly add them.
    d.add(req);
    d.add(res);
    //Now run the handler function in the domain.
    d.run(function() 
    {
        //You'd put your fancy application logic here.
        handleRequest(req, res);
    });
  });
  server.listen(PORT);
} 

Though Domain is pending deprecation and will be removed as the new replacement comes as stated in Node's Documentation

This module is pending deprecation. Once a replacement API has been finalized, this module will be fully deprecated. Users who absolutely must have the functionality that domains provide may rely on it for the time being but should expect to have to migrate to a different solution in the future.

But until the new replacement is not introduced, Domain with Cluster is the only good solution what Node Documentation suggests.

For in-depth understanding Domain and Cluster read

https://nodejs.org/api/domain.html#domain_domain (Stability: 0 - Deprecated)

https://nodejs.org/api/cluster.html

Thanks to @Stanley Luo for sharing us this wonderful in-depth explanation on Cluster and Domains

Cluster & Domains

Airy
  • 5,484
  • 7
  • 53
  • 78
  • Thanks @CodeDevil, actually I answered on March 2013 while the question was asked about 2 years before, so before my answer others were voted but now I can see that people are voting for my answer. Thanks for your kind comment. – Airy Apr 05 '14 at 07:53
  • 1
    By this way, you may need to wrap the whole code and libraries inside `d.run(function(){ //here })`. And in real life, it sucks. – Lewis Jul 23 '14 at 19:05
  • 9
    A word of warning, Domain is pending deprecation: [link](https://nodejs.org/api/domain.html). The suggested method, from the Node docs, is to use cluster: [link](https://nodejs.org/api/cluster.html). – Paul Oct 28 '15 at 22:06
  • @Paul Thanks Paul. I am working on it. And I will update my answer just as I finish working and learning about new solution of preventing Node from crashing. Thanks for informing me. – Airy Nov 04 '15 at 08:29
  • 6
    `restart your application after every unhandled exception!` In case 2000 users are using a node web server for streaming video and 1 user got an exception then restarting won't interrupt all the other users? – Vikas Bansal May 23 '16 at 17:39
  • 2
    @VikasBansal Yes that will surely interrupt all the users and that's why it's bad to use `uncaughtException` and use `Domain` with `Cluster` instead so, if one user faces an exception so only his thread is removed from cluster and created new one for him. And you don't need to restart your Node server as well. While on other side if you do use `uncaughtException` you have to restart your server every time any of your user faces problem. So, use Domain with Cluster. – Airy May 29 '16 at 12:22
  • @VikasBansal in short, `restart your application after every unhandled exception!` is when you use `uncaughtException` and if you use 'Domain' with 'Cluster' as explained in above example, you don't need to do anything. Everything is then handled automatically. – Airy May 29 '16 at 12:26
  • Many thanks for elaborating but Why, hv you called `cluster.fork(); cluster.fork();` twice? – Vikas Bansal May 29 '16 at 13:45
  • 1
    @VikasBansal just because in case of not triggered once so trying for satisfaction. You can call it once but calling it twice makes sure it is triggered. – Airy May 31 '16 at 10:32
  • 4
    what should we do when `domain` is fully deprecated and removed? – Jas Jun 26 '16 at 10:07
  • @Jas Currently we can't say much about it but as soon as the new solution comes in we can probably update our code easily. It has been 3 years but still domain is the only solution. So, we should not be worried. – Airy Jun 27 '16 at 06:20
  • @GiorgiMoniava yes you are right but unfortunately this is the only preferred solution prescribed by Node Documents for last 2-3 Years. – Airy Feb 06 '17 at 07:09
  • 3
    Found this tutorial for those who don't understand the concept of `cluster` and `workers`: https://www.sitepoint.com/how-to-create-a-node-js-cluster-for-speeding-up-your-apps/ – Stanley Luo Feb 08 '17 at 01:33
  • @AbdulJabbarWebBestow What is advantage of Cluster+Domain approach as compared to approach in Staley Luos article: using cluster.on('exit', function(worker, code, signal) and forking a cluster inside it? – Giorgi Moniava Sep 24 '17 at 17:05
  • @GiorgiMoniava I couldn't get your point because Staley's suggested article is same as the one I have wrote. Staley's Article is in-depth on this topic. P.S. Cluster+Domain in short words is the best technique because this keeps your Node Server running. – Airy Sep 25 '17 at 07:56
  • This works great, just two things I'd add. The killtimer here times out in 30 seconds; I think it's worth considering either setting your entire server timeout to 30 seconds or increasing the kill timeout to 2 minutes (https://nodejs.org/docs/latest-v12.x/api/http.html#http_server_settimeout_msecs_callback). Second, in a well-behaved app, restarting the thread should be a last resort. I recommend wrapping all handlers in try-catch and all callbacks with promises to make request handling "airtight" where possible. – ZeroG May 31 '20 at 18:48
  • @ZeroG You are right about setting timeout to preferred to match with server or change server timeout. I actually tried to keep at simple so an average user can understand the code while an advance user can make changes to code/configuration where necessary. – Airy Jun 04 '20 at 09:28
  • Now PM2 or cluster domain? which one – Ali Sherafat Dec 11 '21 at 12:05
  • @AliSherafat PM2 is recommended so if in any case your app/server crashes, it can Restart. While Cluster domain is the way to implement your server. Both are different things and both are recommended to use. So, I would say use Cluster with PM2 – Airy Dec 11 '21 at 12:10
103

I put this code right under my require statements and global declarations:

process.on('uncaughtException', function (err) {
  console.error(err);
  console.log("Node NOT Exiting...");
});

works for me. the only thing i don't like about it is I don't get as much info as I would if I just let the thing crash.

hvgotcodes
  • 118,147
  • 33
  • 203
  • 236
  • 52
    A word of caution: this method works nicely, BUT remember that ALL HTTP responses need to be ended properly. That means that if an uncaught exception occurs while you are handling an HTTP request, you must still call end() on the http.ServerResponse Object. However you implement this is up to you. If you do not do this, the request will hang until the browser gives up. If you have enough of these requests, the server can run out of memory. – BMiner Nov 13 '11 at 21:31
  • BMiner, can you give an example? – tofutim Jan 28 '12 at 00:33
  • 4
    @BMiner, could you provide a better implementation? I noticed this problem (request hanging) so this really isn't better than just restarting the server using `forever` or something. – pixelfreak Mar 03 '12 at 17:59
  • 6
    This calls for an in-depth explanation. I know this sucks, but whenever an uncaught exception occurs, your server needs to reboot ASAP. Really, the purpose of the 'uncaughtException' Event is to use it as an opportunity to send out a warning email, and then use process.exit(1); to shutdown the server. You can use forever or something like that to restart the server. Any pending HTTP requests will timeout and fail. Your users will be mad at you. But, it's the best solution. Why, you ask? Checkout http://stackoverflow.com/questions/8114977/recover-from-uncaught-exception-in-node-js – BMiner Mar 05 '12 at 03:11
  • 3
    To get more information from the uncaught error, use: console.trace(err.stack); – Jesse Dunlap Mar 29 '13 at 16:35
  • 3
    WARNING: The documentation for node says, in no uncertain terms, that you should never do this as it's crazy dangerous: http://nodejs.org/api/process.html#process_event_uncaughtexception – Jeremy Logan Sep 26 '14 at 17:12
  • 2
    As of node.js 0.10.35, this technique no longer works. Let me check if the domain solution works. – k2k2e6 May 31 '15 at 04:40
  • this not caught for me, instead my app crashed and cloud server stopped ! any help ? – Sadanand Aug 22 '18 at 22:24
  • crazy dangerous, NO i just needed to support an old application that will be replaced soon a quick fix so my main application can run when ignoring a empty response. This is just a working temporary solution. – Martijn van Wezel Dec 01 '18 at 01:03
35

As mentioned here you'll find error.stack provides a more complete error message such as the line number that caused the error:

process.on('uncaughtException', function (error) {
   console.log(error.stack);
});
Community
  • 1
  • 1
Sean Bannister
  • 3,105
  • 4
  • 31
  • 43
11

Try supervisor

npm install supervisor
supervisor app.js

Or you can install forever instead.

All this will do is recover your server when it crashes by restarting it.

forever can be used within the code to gracefully recover any processes that crash.

The forever docs have solid information on exit/error handling programmatically.

Michael_Scharf
  • 33,154
  • 22
  • 74
  • 95
Raynos
  • 166,823
  • 56
  • 351
  • 396
  • 10
    Surely this can't be the solution... In the time during which the server is down it can't respond to new incoming requests. An exception might be thrown from application code - the server needs to respond with a 500 error, not just crash and hope its restarted. – Ant Kutschera May 16 '11 at 20:09
  • @AntKutschera your suopposed to have a cluster / load balancer in front of your node. Never run one instance, always run 4 (on a quadcore). – Raynos May 16 '11 at 20:10
  • 23
    So as a hacker, one could figure out that they need to send a simple request to the server and miss out a request parameter - that leads to an undef in the javascript which causes node.js to crash. With your suggestion, I can kill your entire cluster repeatedly. The answer is to make the application fail gracefully - ie handle the uncaught exception and not crash. what if the server was handling many voip sessions? its not acceptable for it to crash and burn and for all those existing sessions to die with it. your users would soon leave. – Ant Kutschera May 17 '11 at 11:44
  • 6
    @AntKutschera that's why exceptions should be exceptional cases. Exceptions should only fire in situations where you _cannot_ recover and where the process _has_ to crash. You should use other means to handle these _exceptional_ cases. But I see your point. You should fail gracefully where possible. There however cases where continuing with a corrupted state will do more damage. – Raynos May 17 '11 at 12:03
  • 2
    Yes, there are different schools of thought here. The way I learned it (Java rather than Javascript) there are acceptable expections which you should expect, known maybe as business exceptions, and then there are runtime exceptions or errors, where you shouldn't expect to recover, like out of memory. One problem with not failing gracefully is that some library which I write might declare that it throws an exception in the case of something recoverable, say where a user could correct their input. in your app, you dont read my docs and just crash, where the user might have been ableto recover – Ant Kutschera May 17 '11 at 12:43
  • 1
    @AntKutschera This is why we log exceptions. You should analyze your production logs for common exceptions, and figure out if and how you could recover from them, instead of letting the server crash. I have used that methodology with PHP, Ruby on Rails, and Node. Regardless of whether or not you exit a process, every time you throw up a 500 error, you're doing your users a disservice. This is not JavaScript or Node-specific practice. – Eric Elliott Apr 18 '13 at 01:10
7

Using try-catch may solve the uncaught errors, but in some complex situations, it won't do the job right such as catching async function. Remember that in Node, any async function calls can contain a potential app crashing operation.

Using uncaughtException is a workaround but it is recognized as inefficient and is likely to be removed in the future versions of Node, so don't count on it.

Ideal solution is to use domain: http://nodejs.org/api/domain.html

To make sure your app is up and running even your server crashed, use the following steps:

  1. use node cluster to fork multiple process per core. So if one process died, another process will be auto boot up. Check out: http://nodejs.org/api/cluster.html

  2. use domain to catch async operation instead of using try-catch or uncaught. I'm not saying that try-catch or uncaught is bad thought!

  3. use forever/supervisor to monitor your services

  4. add daemon to run your node app: http://upstart.ubuntu.com

hope this helps!

franzlorenzon
  • 5,845
  • 6
  • 36
  • 58
Nam Nguyen
  • 5,668
  • 14
  • 56
  • 70
4

Give a try to pm2 node module it is far consistent and has great documentation. Production process manager for Node.js apps with a built-in load balancer. please avoid uncaughtException for this problem. https://github.com/Unitech/pm2

  • 3
    ` restart your application after every unhandled exception!` In case 2000 users are using a node web server for streaming video and 1 user got an exception then restarting won't interrupt all the other users? – Vikas Bansal May 23 '16 at 17:40
  • I was so happy when I discovered PM2. great piece of software – Mladen Janjetovic Jan 30 '17 at 13:49
1

Works great on restify:

server.on('uncaughtException', function (req, res, route, err) {
  log.info('******* Begin Error *******\n%s\n*******\n%s\n******* End Error *******', route, err.stack);
  if (!res.headersSent) {
    return res.send(500, {ok: false});
  }
  res.write('\n');
  res.end();
});
PH Andrade
  • 43
  • 7
1

By default, Node.js handles such exceptions by printing the stack trace to stderr and exiting with code 1, overriding any previously set process.exitCode.

know more

process.on('uncaughtException', (err, origin) => {
    console.log(err);
});
MD SHAYON
  • 7,001
  • 45
  • 38
0

UncaughtException is "a very crude mechanism" (so true) and domains are deprecated now. However, we still need some mechanism to catch errors around (logical) domains. The library:

https://github.com/vacuumlabs/yacol

can help you do this. With a little of extra writing you can have nice domain semantics all around your code!

Tomas Kulich
  • 14,388
  • 4
  • 30
  • 35