3

I have a node.js script that runs and exits fine in console, but it doesn't exit unless I call process.exit() in pm2. PM2 config is:

        {
            name: "worker",
            script: "./worker.js",
            restart_delay: 60000,
            out_file: "/tmp/worker.log",
            error_file: "/tmp/worker_err.log"
        },

I've installed why-is-node-running to see what keeps the process running in 10 seconds after the expected exit and the output is:



There are 9 handle(s) keeping the process running

# TLSWRAP
node:internal/async_hooks:200

# TLSWRAP
node:internal/async_hooks:200

# ZLIB
node:internal/async_hooks:200                                                 
/Users/r/code/app/node_modules/decompress-response/index.js:43          - const decompressStream = isBrotli ? zlib.createBrotliDecompress() : zlib.createUnzip();
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:586
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:768
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:786

# TLSWRAP
node:internal/async_hooks:200

# ZLIB
node:internal/async_hooks:200                                                 
/Users/r/code/app/node_modules/decompress-response/index.js:43          - const decompressStream = isBrotli ? zlib.createBrotliDecompress() : zlib.createUnzip();
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:586
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:768
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:786

# TLSWRAP
node:internal/async_hooks:200

# ZLIB
node:internal/async_hooks:200                                                 
/Users/r/code/app/node_modules/decompress-response/index.js:43          - const decompressStream = isBrotli ? zlib.createBrotliDecompress() : zlib.createUnzip();
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:586
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:768
file:///Users/r/code/app/node_modules/got/dist/source/core/index.js:786

# TLSWRAP
node:internal/async_hooks:200

# Timeout
node:internal/async_hooks:200            
node:internal/async_hooks:468            
node:internal/timers:162                 
node:internal/timers:196                 
file:///Users/r/code/app/worker.js:65
node:internal/process/task_queues:94     

Why doesn't node exit? How do I further debug this?

PS: Sorry for a large paste

UPDATE

I've managed to reproduce this in a comically small 2-liner:

import got from "got";
await got.post('https://anty-api.com/browser_profiles', {form: {a: 123}}).json();

The above code throws as expected when run form console, yet keeps running forever when called by pm2.

UPDATE 2

It does reproduce with an empty app file too.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    You've got a good amount of detail here, but one essential step would be to reduce the amount of code necessary to reproduce the problem down to something that can be posted here. Interesting that it's mentioning the `got` module in the contexts of ZLIB TLSWRAP and a Timeout. But we don't have the actual code to see what's up. Can you whittle this down to a few lines that reproduce the problem and post some code? – Wyck Apr 18 '22 at 14:20
  • 1
    My first thought: does pm2 launch your process with the same privileges, arguments, environment variables, and working directory as when you launch it from the console? Could be a failed file/network operation that then neglects to resolve something. – Wyck Apr 18 '22 at 14:41
  • 1
    @Wyck, thanks, I did. Please check my update – eeeeeeeeeeeeeeeeeeeeeeeeeeeeee Apr 18 '22 at 14:45
  • 1
    When I try this I get an HTTP 401 Unauthorized response, which `got` throws as an exception. Your comically small repro doesn't handle this exception. pm2 can tell the difference between a crash (due to the unhandled exception) and a clean exit. Could that be what's up? Compare: `got.post('https://anty-api.com/browser_profiles', {form: {a: 123}}).json().catch(console.error);` (or try/catch in your awaitable approach.) – Wyck Apr 18 '22 at 15:07
  • @Wyck, doesn't matter, the process won't stop. I see it with `ps aux`. So pm2 correctly assumes it's running, but why is it running? – eeeeeeeeeeeeeeeeeeeeeeeeeeeeee Apr 18 '22 at 15:40
  • 1
    Wait a sec... Does this reproduce with an empty app.js file? Do you perhaps just misunderstand what it means to have "exited" within the context of pm2? Do you understand what [ProcessContainerFork.js](https://github.com/Unitech/pm2/blob/master/lib/ProcessContainerFork.js) does? The node _process_ won't stop in fork mode when your module returns because it maintains its connection to the god process. (That's also why you're observing a restart when you manually add `application.exit()`.) If this reproduces with an empty app.js, then you should modify your question. – Wyck Apr 18 '22 at 19:34
  • @Wyck, yeah it does reproduce with an empty app file – eeeeeeeeeeeeeeeeeeeeeeeeeeeeee Apr 19 '22 at 07:50

1 Answers1

3

I think this is just the way pm2 works. You can expect that, when running under pm2, the node process will continue to run forever, (whether your app is responsible for pending async event sources or not) unless you either crash or do something to explicitly terminate it such as process.exit().

As you've discovered, this has nothing to do with any code in your app.js. Even an empty app.js exhibits this behaviour. This is a fundamental design aspect of pm2. It wraps your program and it's the wrapper that is keep the node process alive.

This is because pm2 runs your program (in forked mode, as opposed to cluster mode) by launching a node process that runs ProcessContainerFork.js (the wrapper). This module establishes and maintains a connection to pm2's managing process (a.k.a "god daemon") and loads your app's main module with require('module')._load(...). The communication channel will always count as an event source that keeps the actual node process alive.

Even if your program does nothing, the status of your program will be "online". Even if your program reaches the state where, had it been launched directly, node would have exited, the state is still "online" in this case because of the wrapper.

This leaves the designers of pm2 with the challenge of trying to know if your program is no longer responsible for any events (in which case node would normally exit). pm2 doesn't have the feature to distinguish between reasons node is being kept alive due to code you wrote in your app.js vs reasons node is being kept alive due to the infrastructure established by ProcessContainerFork.js. One could certainly imagine that pm2 could use async_hooks to keep track of event sources originating from your app rather than from ProcessContainerFork.js (much like how why-is-node-running does), and then tearing down properly when it reaches this state. Perhaps pm2 chooses not to do this to avoid the performance penalty associated with async hooks? Perhaps an app that exits on purpose but is intended to be restarted seems too much like a cron job? I'm speculating yours is not the primary use case for pm2. I suppose you could make a feature request and see what the pm2 authors have to say about it.

I think this means if you want to gracefully exit and have pm2 restart your program, you'll need to call process.exit to do so. You won't be able to rely on node knowing that there are no more event sources because pm2 is responsible for some of them. You will, of course, have to ensure that all your relevant pending promises or timers have resolved before calling process.exit because that will immediately terminate the process without waiting for pending things to happen.

Wyck
  • 10,311
  • 6
  • 39
  • 60
  • I am not a pm2 author. This could very well just be an oversight or shortcoming, or even a bug -- I'm not certain. Someone more familiar may have more insight or know some secret command line option to make it restart at the appropriate time. – Wyck Apr 19 '22 at 15:48
  • Thanks for such a thorough response – eeeeeeeeeeeeeeeeeeeeeeeeeeeeee Apr 19 '22 at 20:06