0

Issue: getting heap overflow error when trying to call a large number of sequelize calls.

It appears that .map isn't correctly garbage-collecting when being run in concurrency mode.

This code snippet fails:

let trims = [...]; // length is about 50,000
return models.sequelize.transaction((t) => {
  return Promise.map(trims, (trim) => {
    // sequelize
    return models.Trim.upsert(trim, {transaction: t});
  }, {concurrency: 2500});
});

This code snippet is successful:

let trims = [...]; // length is about 50,000
let batches = [];

while(trims.length) {
  batches.push(trims.splice(0, 2500));
}

return models.sequelize.transaction((t) => {
  return Promise.each(batches, (batch) => {
    return Promise.map(batch, (trim) => {
      return models.Trim.upsert(trim, {transaction: t}));
    });
  });
});

What is it about .map that prevents the first snippet from correctly garbage collecting?

Is there a way to fix that code snippet so that it will work correctly without overflowing the heap?

node v6.8.0 bluebird v3.4.6 sequelize v3.27.0

C Deuter
  • 893
  • 1
  • 8
  • 14
  • You should probably report this as a bug to the Bluebird project. You'll need someone with a lot of insight into how Bluebird works to answer this anyway. – mikl Dec 20 '16 at 02:28
  • Do you realize that `Promise.map()` returns results from all 50,000 operations you are running? So, you're asking to collect all the results at once. Nothing in the results can be GCed until all 50,000 are done. Probably you need to restructure your code so you aren't accumulating 50,000 results in memory. – jfriend00 Dec 20 '16 at 03:42
  • @jfriend00 sequelize upsert only returns a bool as the return value, even discarding this value using `.then(() => void 0)` or something along those lines doesn't help this error (and 50k bools is really nothing anyway), so I would assume that it is probably something more along the lines of the StackTrace for each promise. – C Deuter Dec 20 '16 at 18:53
  • My point is that you really don't want to accumulate 50,000 promises just to run them serially and know when the last one is done. That's definitely memory inefficient. Instead, you should probably just sequence manually (example [here](http://stackoverflow.com/questions/29880715/how-to-synchronize-a-sequence-of-promises/29906506#29906506)) and then you will have no accumulation of anything as you go other than any data you manually collect into a variable. – jfriend00 Dec 20 '16 at 18:58
  • @jfriend00 ahh. that's the thing aren't being run serially, they are being run concurrently with the [.map()](http://bluebirdjs.com/docs/api/promise.map.html) call. – C Deuter Dec 20 '16 at 20:42
  • @CDeuter - Well, in looking at [Bluebird's code for `.map()`](https://github.com/petkaantonov/bluebird/blob/master/src/map.js), I don't see any particular reason that resolved promises wouldn't be GCed during the operation, but the code is a bit hard to follow (lots of inheritance and shared code) so I'm not entirely sure whether this is or isn't a promise issue, but as I said if you just control the execution yourself, then you can control that yourself more directly. – jfriend00 Dec 21 '16 at 06:26

0 Answers0