3

I am currently parsing a list of js objects that are upserted to the db one by one, roughly like this with Node.js:

return promise.map(list,
    return parseItem(item)
        .then(upsertSingleItemToDB)
    ).then(all finished!)

The problem is that when the list sizes grew very big (~3000 items), parsing all the items in parallel is too memory heavy. It was really easy to add a concurrency limit with the promise library and not run out of memory that way(when/guard).

But I'd like to optimize the db upserts as well, since mongodb offers a bulkWrite function. Since parsing and bulk writing all the items at once is not possible, I would need to split the original object list in smaller sets that are parsed with promises in parallel and then the result array of that set would be passed to the promisified bulkWrite. And this would be repeated for the remaining sets if list items.

I'm having a hard time wrapping my head around how I can structure the smaller sets of promises so that I only do one set of parseSomeItems-BulkUpsertThem at time (something like Promise.all([set1Bulk][set2Bulk]), where set1Bulk is another array of parallel parser Promises?), any pseudo code help would be appreciated (but I'm using when if that makes a difference).

blub
  • 8,757
  • 4
  • 27
  • 38

2 Answers2

1

It can look something like this, if using mongoose and the underlying nodejs-mongodb-driver:

const saveParsedItems = items => ItemCollection.collection.bulkWrite( // accessing underlying driver
   items.map(item => ({
      updateOne: {
           filter: {id: item.id}, // or any compound key that makes your items unique for upsertion
           upsert: true,
           update: {$set: item} // should be a key:value formatted object
      }
   }))
);


const parseAndSaveItems = (items, offset = 0, limit = 3000) => { // the algorithm for retrieving items in batches be anything you want, basically
  const itemSet = items.slice(offset, limit);
  
  return Promise.all(
    itemSet.map(parseItem) // parsing all your items first
  )
    .then(saveParsedItems)
    .then(() => {
      const newOffset = offset + limit;
      if (items.length >= newOffset) {
        return parseAndSaveItemsSet(items, newOffset, limit);
      }
      
      return true;
    });
};

return parseAndSaveItems(yourItems);
nainy
  • 520
  • 1
  • 4
  • 21
  • 1
    Ah, recursion, of course! My brain was just running around in circles trying to make some long chain of Promises... Thanks a bunch, this was exactly what I was looking for <3 – blub Apr 05 '17 at 11:42
1

The first answer looks complete. However here are some other thoughts that came to mind.

As a hack-around, you could call a timeout function in the callback of your write operation before the next write operation performs. This can give your CPU and Memory a break inbetween calls. Even if you add one millisecond between calls, that is only adding 3 seconds if you have a total of 3000 write objects.

Or you can segment your array of insertObjects, and send them to their own bulk writer.