3

Please check if my understanding about the following for loop is correct.

for(let i=0; i<1000; i){
  sample_function(i, function(result){});
}

The moment the for loop is invoked, 1000 events of sample_function will be qued in the event loop. After about 5 seconds a user gives a http request, which is qued after those "1000 events". Usually this would not be a problem because the loop is asynchronous. But lets say that this sample_function is a CPU intensive function. Therefore the "1000 events" are completed consecutively and each take about 1 second. As a result, the for loop will block for about 1000 seconds.

Would there be a way to solve such problem? For example would it be possible to let the thread take a "break" every 10 loops? and allow other new ques to pop in between? If so how would I do it?

Sihoon Kim
  • 1,481
  • 2
  • 13
  • 32

2 Answers2

3

Try it this:

 for(let i=0; i<1000; i++)
 {
    setTimeout(sample_function, 0, i, function(result){});
 }

or

function sample_function(elem, index){..}

var arr = Array(1000);
arr.forEach(sample_function);
Eriks Klotins
  • 4,042
  • 1
  • 12
  • 26
  • just to make sure what I understood is correct -> if I use `setTimeout` instead of queing the whole `for loop` at once, everytime it ques the one of the 1000 ques, and in between it will allow other ques to come in. Is it correct? – Sihoon Kim Nov 09 '18 at 07:21
  • yes. See here for more detailed explanation: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop#Adding_messages – Eriks Klotins Nov 09 '18 at 08:02
  • I think the second example, where you generate an empty array, doesn't work. If you generate an array with a for loop, where each index contains some value, then you can use forEach in the manner shown here. – Matt Korostoff Feb 11 '21 at 01:45
  • As fart as I know and tested, both examples provided are wrong, running setTimeout in a loop will schecule the setTimeout's callbacks by putting them to `timers` phase's queue of the event loop one after another(in your example 1000 callbacks will be put on `timer`'s queue and when event loop iterates and enters the `timers` phase, it tries to exhaust the queue by dequeuing callbacks from `timer`'s queue one by one and putting them on `call stack` to get executed, which means that between each iteration there will be no breathing space(meaning event loop does not iterate between each task) – Gandalf Feb 22 '23 at 07:51
  • 1
    fix: let process = (i) => new Promise((resolve) => { setTimeout(sample_function, 0, i, function(result){ resolve(result) }); }) for(let i=0; i<1000; i++) { await process(i); } this way we are ensuring that loop won't proceed until the promise returned by process is resolved, which would resolve when setTimeout's callback is called, which means after each iteration event loop has to keep iterating to so that it can get to timer phase to execute setTimeout's callback of the current iteration and that's why between each iteration there will be a breathing space – Gandalf Feb 22 '23 at 08:43
  • @Gandalf yea you are correct. Should have mentioned that here as well. `setTimeout` was the hint I needed. But yes, need to wrap that with a Promise. – Sihoon Kim Feb 24 '23 at 09:51
1

There is a technique called partitioning which you can read about in the NodeJs's document, But as the document states:

If you need to do something more complex, partitioning is not a good option. This is because partitioning uses only the Event Loop, and you won't benefit from multiple cores almost certainly available on your machine.

So you can also use another technique called offloading, e.g. using worker threads or child processes which also have certain downsides like having to serialize and deserialize any objects that you wish to share between the event loop (current thread) and a worker thread or a child process

Following is an example of partitioning that I came up with which is in the context of an express application.

const express = require('express');
const crypto = require('crypto');
const randomstring = require('randomstring');

const app = express();
const port = 80;

app.get('/', async (req, res) => {
    res.send('ok');
})

app.get('/block', async (req, res) => {
    let result = [];
    for (let i = 0; i < 10; ++i) {
        result.push(await block());
    }
    res.send({result});
})

app.listen(port, () => {
    console.log(`Listening on port ${port}`);
    console.log(`http://localhost:${port}`);
})

/* takes around 5 seconds to run(varies depending on your processor) */
const block = () => {
    //promisifying just to get the result back to the caller in an async way, this is not part of the partitioning technique
    return new Promise((resolve, reject) => {
        /**
         * https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#partitioning
         * using partitioning techinique(using setImmediate/setTimeout) to prevent a long running operation
         * to block the eventloop completely
         * there will be a breathing period between each time block is called
         */
        setImmediate(() => {
            let hash = crypto.createHash("sha256");
            const numberOfHasUpdates = 10e5;
            for (let iter = 0; iter < numberOfHasUpdates; iter++) {
                hash.update(randomstring.generate());
            }
            resolve(hash);
        })
    });
}

There are two endpoints / and /block, if you hit /block and then hit / endpoint, what happens is that the / endpoint will take around 5 seconds to give back response(during the breathing space(the thing that you call it a "break"))

If setImmediate was not used, then the / endpoint would respond to a request after approximately 10 * 5 seconds(10 being the number of times block function is called in the for-loop)

Also you can do partitioning using a recursive approach like this:

/**
 * 
 * @param items array we need to process
 * @param chunk a number indicating number of items to be processed on each iteration of event loop before the breathing space
 */
function processItems(items, chunk) {
    let i = 0;
    const process = (done) => {
        let currentChunk = chunk;
        while (currentChunk > 0 && i < items?.length) {
            --currentChunk;
            syncBlock();
            ++i;
        }

        if (i < items?.length) {
            setImmediate(process);//the key is to schedule the next recursive call (by passing the function to setImmediate) instead of doing a recursive call (by simply invoking the process function)
        }
    }
    process();
}

And if you need to get back the data processed you can promisify it like this:

function processItems(items, chunk) {
    let i = 0;
    let result = [];
    const process = (done) => {
        let currentChunk = chunk;
        while (currentChunk > 0 && i < items?.length) {
            --currentChunk;
            const returnedValue = syncBlock();
            result.push(returnedValue);
            ++i;
        }

        if (i < items?.length) {
            setImmediate(() => process(done));
        } else {
            done && done(result);
        }
    }
    const promisified = () => new Promise((resolve) => process(resolve));
    return promisified();
}

And you can test it by adding this route handler to the other route handlers provided above:

app.get('/block2', async (req, res) => {
    let result = [];

    let arr = [];
    for (let i = 0; i < 10; ++i) {
        arr.push(i);
    }
    result = await processItems(arr, 1);
    res.send({ result });
})
Gandalf
  • 2,921
  • 5
  • 31
  • 44