2

I have hight loaded script based on setTimeout recalls. After several days of uninterrupted operation, sometimes there is a break in the sequence.

Code example:

function someFnc(threadNum) {
  try {
    console.log(`[${threadNum}] Execution...`);

    // a lot of code...

    console.log(`[${threadNum}] Going to sleep 30s...`);
    setTimeout(() => someFnc(threadNum), 30 * 1000);
  } catch(e) {
    console.error(e);
    setTimeout(() => someFnc(threadNum), 1000);
  }
}
// run threads
someFnc(0);
someFnc(1);
someFnc(2);

In console I see this:

[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...


[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...


[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...

it works 1-2 day and some thread (for example 2) freeze on [2] Going to sleep 30s... as a result, other threads works fine but some thread can freeze too.

My thoughts:

  1. I have a console.log at the very beginning of the function, I will definitely see if the function is run. Therefore, I can conclude that the function was never called again after the hangup.
  2. The last message I see is [n] Going to sleep 30s... followed by setTimeout and nothing else.
  3. For this 2 reasons, I can conclude that the problem with setTimeout.
  4. I have a highly loaded system, millions of setTimeout executions per day, sometimes my CPU is 100% loaded and has freezes for a few seconds. I think that this may be the reason for the failure of the timer, I have no other ideas.

Does anyone know how to track if the timer has been started? How can this be debugged?

Who knows how setTimeout works at the kernel level?

Perhaps when I do setTimeout(..., 30 * 1000), the system remembers that my code should be executed for example on 06/26/2022 10:01:54.123, checks every 17 milliseconds for example the current system time and if it finds a time match (+- 50ms for example), it run it. But, if the CPU freezes for 2 seconds, the next tick occurs later and the timer simply loses/ignores this task, refuses to start it because it is very old?

although it seems to me that all tasks that must be completed after a period of time (timeout/interval) form a queue, and if one was not completed, then the other would not be started either. In this case, the whole program would freeze, but I see that the rest of the timers work without problems.

In my case, the accuracy of the timer "on time" is not very important, but it must be executed. If my assumptions are correct and nodejs is skipping "stale" timers, is it possible to avoid this and force them to run?

I used nodejs 12.x, but decided to update and now I have v14.18.2.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
mixalbl4
  • 3,507
  • 1
  • 30
  • 44
  • It's a subtle nuance, but it may be important in this case. Javascript, and NodeJS as a consequence, is single threaded. It uses an event loop to handle asynchronous tasks, which can look like multi-threading from the surface. If you want a deeper dive into the event loop, I recommend [this talk](https://www.youtube.com/watch?v=cCOL7MC4Pl0) by Jake Archibald. – CerebralFart Jun 27 '22 at 10:19

1 Answers1

0

One case I know is setting it to wait for more that about 24.855 days. It just fails.

Second, the JS time is not accurate, i.e. 1000ms are sometimes 988ms sometimes 1002ms and so on... So there's a big chance that if you have many timeouts (which is not recommended as it consumes a ton of resources) they can definitely overlap.

If you need an order, just use a Queue or a Message-broker.

Also there might be memory issues - Is there any limit to setTimeout?

And lastly, I'd change it to:

const sleep = async (seconds) => await new Promise(resolve => setTimeout(resolve, seconds * 1000));

while (true) {
    // your code

    await sleep(30);
}
EvgenyKolyakov
  • 3,310
  • 2
  • 21
  • 31
  • Thanks, but this is not my cases :( My timeout is always 30 sec is this case. No matter order for me and 20 or 40 second timeout too. But problem is when my function was not running after 30 seconds. And after that, it never starts again, not in an hour or in a day. The recursion just breaks forever. – mixalbl4 Jun 27 '22 at 10:25
  • Minor correction: the spec requires that the delay is _at least_ the specified timeout. So `setTimeout(..., 1000)` will _not_ call back after 988ms, that'd be a bug. (But it may certainly call back after 1002ms, or 1100ms, or a million ms, etc. -- the engine will try to make it as close to 1000ms as possible, but if the main thread is busy at that time, the callback has to wait its turn.) – jmrk Jun 27 '22 at 10:26
  • **To understand the problem:** you can imagine that every 1,000,000th call to setTimeout would simply be missed. Then the function will never run again, never recurse again. – mixalbl4 Jun 27 '22 at 10:29
  • **upd:** I have enough RAM and if this bug is relative to RAM I should see Fatal errors in console. **About your code:** Your code works same (I have same part in one module with same `sleep` fnc), after 1-2 day of working it will freeze on sleep, because setTimeout will never call resolve because of this strange bug. At the begining I thougth that problem with `await`, but last time it was with clear setTimeout like in my post. – mixalbl4 Jun 27 '22 at 10:37
  • Works fine for me for weeks.... – EvgenyKolyakov Jun 27 '22 at 10:38