15

I'm new to node.js and am currently trying to code array iterations. I have an array of 1,000 items - which I'd like to iterate through in blocks of 50 items at a time due to problems with server load.

I currently use a forEach loop as seen below (which I'm looking at hopefully transforming into the aforementioned block iteration)

   //result is the array of 1000 items

   result.forEach(function (item) {
     //Do some data parsing
     //And upload data to server
    });

Any help would be much appreciated!

UPDATE (in reponse to reply)

async function uploadData(dataArray) {
    try {
        const chunks = chunkArray(dataArray, 50);
        for (const chunk of chunks) {
            await uploadDataChunk(chunk);
        }
    } catch (error) {
        console.log(error)
        // Catch en error here
    }
}

function uploadDataChunk(chunk) {
    return Promise.all(
        chunk.map((item) => {
            return new Promise((resolve, reject) => {
               //upload code
                }
            })
        })
    )
}
JMP
  • 4,417
  • 17
  • 30
  • 41
Hendies
  • 404
  • 1
  • 4
  • 14
  • This sounds like an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) – charlietfl Oct 08 '17 at 15:07
  • I assume that the tasks you are processing in the loop are executed asynchronously. Therefore the main problem is not the loop but that the asynchronous task immediately returns and therefore 1000 async tasks are executed simultaneously, right? – Robert Oct 08 '17 at 15:19
  • @charlietfl yes and no, I know the write requests per second that our database can handle and the blocks function would allow me to keep within those limits – Hendies Oct 08 '17 at 16:44
  • @Robert that's exactly the conundrum, I require the upload to be asynchronous but require it to be limited to the write limit capabilities of our database – Hendies Oct 08 '17 at 16:47

3 Answers3

17

You should firstly split your array to chunks of 50. Then you need to make requests one by one, not at once. Promises can be used for this purpose.

Consider this implementation:

function parseData() { } // returns an array of 1000 items

async function uploadData(dataArray) {
  try {
    const chunks = chunkArray(dataArray, 50);
    for(const chunk of chunks) {
      await uploadDataChunk(chunk);
    }
  } catch(error) {
    // Catch an error here
  }
}

function uploadDataChunk(chunk) {
  // return a promise of chunk uploading result
}

const dataArray = parseData();
uploadData(dataArray);

Using async/await will use promises under the hood, so that await will wait till current chunk is uploaded and only then will upload next one (if no error occurred).

And here is my proposal of chunkArray function implementation:

function chunkArray(array, chunkSize) {
  return Array.from(
    { length: Math.ceil(array.length / chunkSize) },
    (_, index) => array.slice(index * chunkSize, (index + 1) * chunkSize)   
  );
}

Note: this code uses ES6 features, so it it desirable to use babel / TypeScript.

Update

If you create multiple asynchronous database connections, just use some database pooling tool.

Update 2

If you want to update all the chunks asynchronously, and when chunk is uploaded start to upload another one, you can do it this way:

function uploadDataChunk(chunk) {
  return Promise.all(
    chunk.map(uploadItemToGoogleCloud) // uploadItemToGoogleCloud should return a promise
  );
}
Yuriy Yakym
  • 3,616
  • 17
  • 30
  • I run my database upload from the 50 item chunk in a forEach format, how do I return a promise once the forEach has finished iterating through the chunk? Many thanks Yuriy! – Hendies Oct 09 '17 at 12:32
  • You mean forEach inside `uploadDataChunk`? Do you upload every item separately? What does your update function return? Is it asynchronous? – Yuriy Yakym Oct 09 '17 at 16:14
  • Yes, I use Google Cloud Firestore which can return promises on every file upload. It can't do batch writes in this format though. I have to upload each item separately inside the forEach block, and in an asynchronous manner due to speed. A counter could work but I'm not sure how to implement the completion with your 'await' line and the promise needed to trigger the next chunk upload – Hendies Oct 09 '17 at 17:33
  • If you want to upload all 50 items from current chunk asynchronously, then see my updated answer (Update 2). If you want to upload each item synchronously, then you do not chunks. – Yuriy Yakym Oct 09 '17 at 20:06
  • I've tried your new code, and the first chunk uploads perfectly - however the function then finishes and doesn't continue onto the next chunks. I've posted my code as an update to my question. Any ideas? I appreciate the help! – Hendies Oct 10 '17 at 14:45
  • Take a look at this playground. Seems to work as needed. http://jsbin.com/guzebojuta/2/edit?js,console – Yuriy Yakym Oct 10 '17 at 17:33
  • I've tried the code out - it works a charm! Thank you for all the help – Hendies Oct 10 '17 at 18:51
5

You may chunk your array in the required chunk size as follows;

function chunkArray(a,s){ // a: array to chunk, s: size of chunks
  return Array.from({length: Math.ceil(a.length / s)})
              .map((_,i) => Array.from({length: s})
                                 .map((_,j) => a[i*s+j]));
}

var arr = Array(53).fill().map((_,i) => i); // test array of 53 items
console.log(chunkArray(arr,5))              // chunks of 5 items.
.as-console-wrapper{
max-height: 100% ! important;
}
Redu
  • 25,060
  • 6
  • 56
  • 76
  • your code works perfectly,but the other answer gave some extra debugging with database uploads and promises. Up-voted nonetheless! – Hendies Oct 10 '17 at 18:59
  • @Hendies Thank you... But what I show here is just the chunking infrastructure. Then all you have to do is to `Promise.all()` the chunks and `Promise.all()` the main array. It's basically so cool..! – Redu Oct 10 '17 at 19:24
  • adding a filter would be nice- in the sample code, the last array has 2 `undefined` in it. – Capaj Jul 04 '20 at 11:13
2

There's a library for this that used to be very popular: async.js (not to be confused with the async keyword). I still think it's sometimes the cleaner approach though these days with async/await I tend to do it manually in a for loop.

The async library implements many asynchronous flow-control design pattern. For this case you can use eachLimit:

const eachLimit = require('async/eachLimit');

eachLimit(result, 50,
    function (item) {
        // do your forEach stuff here
    },
    function (err) {
        // this will be called when everything is completed
    }
);

Or if you prefer you can use the promisified version so that you can await the loop:

const eachLimit = require('async/eachLimit');

async function processResult (result) {
    // ...

    try {
        await eachLimit(result, 50, function (item) {
            // do your forEach stuff here
        });
    }
    catch (err) {
        // handle thrown errors
    }
}

In this specific case it's quite easy to manually batch the operations and use await to pause between batches but the async.js library includes a rich set of functions that are useful to know. Some of which are still quite difficult to do even with async/await like whilst (an asynchronous while), retry, forever etc. (see documentation: https://caolan.github.io/async/v3/docs.html)

slebetman
  • 109,858
  • 19
  • 140
  • 171