6

I am working on node backend trying to optimize a very heavy query to mongodb via mongoose. The expected return size is considerable, but for some reason when I make the request, node begins consuming huge amounts of memory, like, 200mb+ for a single big request.

Considering the size of the return is less than 10mb in most cases, this doesn't seem right. It also refuses to let go of memory after it has finished, I know this is probably just the V8 GC doing its default behavior, but what concerns me is the huge amount of memory being consumed for a single find() request.

I've isolated it down through testing to the find() call. Once done with the call, it performs some post processing then sends the data to a callback, all in an anonymous function. I have tried using the querystream instead of the model.find(), but it shows no real improvements.

Looking around has not yielded any responses, so I will ask, is there a known way to reduce, control, or optimize the memory usage by mongoose? Does anyone know why so much excess memory is being used for a single call?

EDIT

As per Johnny and Blakes suggestions, using mixture of lean() with streaming, and using pause and resume have improved the runtime and memory usage immensely. Thank you!

JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
Javeed
  • 221
  • 2
  • 6
  • When processing the `.stream()` output, are you ever calling `.pause()`? Or indeed what actions are you performing in the "data" event processor and is there any other queue control implement there? – Blakes Seven Jul 23 '15 at 00:08
  • I make no pause or resume calls. In data, all i'm doing is appending the docs onto an array to be sent through the callback on close. – Javeed Jul 23 '15 at 00:19

2 Answers2

9

Default mongoose .find() of course returns all results as an "array", so that is always going to use memory with large results, so this leaves the "stream" interface.

The basic problem here is that you are using a stream interface ( as this inherits from the basic node stream ) each data event "fires" and the associated event handler is executed continuosly.

This means that even with a "stream" your subsequent actions in the event handler are "stacking" up, at the very least consuming lots of memory and possibly eating up the call stack if there are further asynchronous processes being fired in there.

So the best thing you can do is start to "limit" the actions in your stream processing. This is as simple as calling the .pause() method:

var stream = model.find().stream();   // however you call

stream.on("data",function() {
    // call pause on entry
    stream.pause();

    // do processing
    stream.resume();            // then resume when done
});

So .pause() stops the events in the stream being emitted and this allows the actions in your event handler to complete before continuing so that they are not all coming at once.

When your handling code is complete, you call .resume(), either directly within the block as shown here of within the callback block of any asynchronous action performed within the block. Note that the same rules apply for async actions, and that "all" must signal completion before you should call resume.

There are other optimizations that can be applied as well, and you might do well to look a "queue processing" or "async flow control" available modules to aid you in getting more performance with some parallel execution of this.

But basically think .pause() then process and .resume() to continue to avoid eating up lots of memory in your processing.

Also, be aware of your "outputs" and similarly try to use a "stream" again if building up something for a response. All this will be for nothing if the work you are doing is just actually building up another variable in memory, so it helps to be aware of that.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
8

You can use to use the lean option for a Mongoose query as long you only need plain JavaScript documents and not full Mongoose doc instances. This results in faster performance and reduced memory usage.

model.find().lean().exec(function(err, docs) {...});

You can also combine lean() with streaming the results which should reduce memory usage even further.

var stream = model.find().lean().stream();
JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • Thank you for this suggestion, roughly, how much of an improvement could this bring? EDIT: i actually found this: https://groups.google.com/forum/#!topic/mongoose-orm/u2_DzDydcnA/discussion Which indicates about 3 possibly 3.5 times performance improvement – Doug Molineux Dec 14 '15 at 22:40