I have hight loaded script based on setTimeout recalls. After several days of uninterrupted operation, sometimes there is a break in the sequence.
Code example:
function someFnc(threadNum) {
try {
console.log(`[${threadNum}] Execution...`);
// a lot of code...
console.log(`[${threadNum}] Going to sleep 30s...`);
setTimeout(() => someFnc(threadNum), 30 * 1000);
} catch(e) {
console.error(e);
setTimeout(() => someFnc(threadNum), 1000);
}
}
// run threads
someFnc(0);
someFnc(1);
someFnc(2);
In console I see this:
[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...
[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...
[0] Execution...
[1] Execution...
[2] Execution...
...other logs...
[0] Going to sleep 30s...
[1] Going to sleep 30s...
[2] Going to sleep 30s...
it works 1-2 day and some thread (for example 2) freeze on [2] Going to sleep 30s...
as a result, other threads works fine but some thread can freeze too.
My thoughts:
- I have a
console.log
at the very beginning of the function, I will definitely see if the function is run. Therefore, I can conclude that the function was never called again after the hangup. - The last message I see is
[n] Going to sleep 30s...
followed bysetTimeout
and nothing else. - For this 2 reasons, I can conclude that the problem with
setTimeout
. - I have a highly loaded system, millions of
setTimeout
executions per day, sometimes my CPU is 100% loaded and has freezes for a few seconds. I think that this may be the reason for the failure of the timer, I have no other ideas.
Does anyone know how to track if the timer has been started? How can this be debugged?
Who knows how setTimeout works at the kernel level?
Perhaps when I do setTimeout(..., 30 * 1000)
, the system remembers that my code should be executed for example on 06/26/2022 10:01:54.123, checks every 17 milliseconds for example the current system time and if it finds a time match (+- 50ms for example), it run it.
But, if the CPU freezes for 2 seconds, the next tick occurs later and the timer simply loses/ignores this task, refuses to start it because it is very old?
although it seems to me that all tasks that must be completed after a period of time (timeout/interval) form a queue, and if one was not completed, then the other would not be started either. In this case, the whole program would freeze, but I see that the rest of the timers work without problems.
In my case, the accuracy of the timer "on time" is not very important, but it must be executed. If my assumptions are correct and nodejs is skipping "stale" timers, is it possible to avoid this and force them to run?
I used nodejs 12.x, but decided to update and now I have v14.18.2
.