4

Hello,

I use Node.js to provide an API for storing data on a MongoDB database.

I ran multiple tests on a read method, which takes ids and returns the corresponding documents. The point is that I must return these documents in the specified order. To ensure that, I use the following code:

// Sequentially fetch every element
function read(ids, callback) {
    var i = 0;
    var results = [];
    function next() {
        db.findOne(ids[i], function (err, doc) {
            results.push(err ? null : doc);
            if (ids.length > ++i) {
                return next();
            }
            callback(results);
        });
    }
    next();
}

This way, documents are fetched one-by-one, in the right order. It takes about 11s on my laptop to retrieve 27k documents.

However, I thought that it was possible to improve this method:

// Asynchronously map the whole array
var async = require('async');

function read(ids, callback) {
    async.map(ids, db.findOne.bind(db), callback):
}

After running a single test, I was quite satisfied seeing that the 27k documents were retrieved in only 8s using simpler code.

The problem happens when I repeat the same request: the response time keeps growing (proportionally to the number of elements retrieved): 9s 10s 11s 12s.... This problem does not happen in the sequential version.

I tried two versions of Node.js, v6.2.0 and v0.10.29. The problem is the same. What causes this latency and how could I suppress it?

GnxR
  • 829
  • 1
  • 8
  • 20
  • 1
    Why don't you use `find`? – vp_arth Jun 07 '16 at 16:28
  • See here: http://stackoverflow.com/questions/8303900/mongodb-mongoose-findmany-find-all-documents-with-ids-listed-in-array – Ryan Allred Jun 07 '16 at 16:30
  • 1
    try to use `async.mapLimit` to prevent overload. But `find({_id: {$in: list}})` is always better, because single database request instead of multiple. – vp_arth Jun 07 '16 at 16:30
  • @vp_arth `async.mapLimit` is exactly what I was looking for! I would like to just `find()` but I must provide the documents in the input order (Mongo's `find` returns them sorted by their id). Thanks a lot! – GnxR Jun 07 '16 at 18:10
  • May be js-side order restore will be much more efficient. Just try it. I think you can achieve subsecond time for 30k documents. – vp_arth Jun 07 '16 at 18:17

1 Answers1

4

Try to use async.mapLimit to prevent overload. You need some tests to tune limit value with your environment.


But find({_id: {$in: list}}) is always better, because single database request instead of multiple.

I suggest you to try to perform restore of original order client-side.
Something like this:

function read(ids, cb) {
  db.find(
    {_id: {$in: ids.map(id => mongoose.Types.ObjectId(id))}},
    process
  );

  function process(err, docs) {
    if (err) return cb(err);
    return cb(null, docs.sort(ordering))
  }
  function ordering(a, b) {
    return ids.indexOf(b._id.toString()) - ids.indexOf(a._id.toString());
  }
}

May be, find query needs to be corrected, I can't to know what exact mongodb driver you use.

This code is first-try, more manual sorting can improve performance alot. [].indexOf is heavy too(O(n)).
But I'm almost sure, even as-is now, it will work much faster.


Possible ordering replacement:

var idHash = {};
for(var i = 0; i < ids.length; i++)
  idHash[ids[i]] = i;
function ordering(a, b) {
  return idHash[b._id.toString()] - idHash[a._id.toString()];
}

Any sort algorithm has O(nlogn) in best case, but we already know result position of each found document, so, we can restore original order by O(n):

var idHash = ids.reduce((c, id, i) => (c[id] = i, c), {});
function process(err, docs) {
  if (err) return cb(err);
  return cb(null, 
    docs.reduce(
      (c, doc) => (c[idHash[doc._id.toString()]] = doc, c),
      ids.map(id => null))) //fill not_found docs by null
}

Functional style makes code flexier. For example this code can be easy modified to use async.reduce to be less sync-blocking.

vp_arth
  • 14,461
  • 4
  • 37
  • 66
  • 1
    Thanks! I used your `find` solution with the `idHash` object, but in a new array (more time-efficient than in-place sorting, and I need to fill the empty values). It is not as memory efficient as the `async.mapLimit` solution, but gives ~1s response times instead of ~7s (with the test array of 27k ids). – GnxR Jun 07 '16 at 20:16
  • Yeah, you actually don't need any sorting, when you already know each element position. – vp_arth Jun 07 '16 at 22:23
  • @GnxR, take a look how to implement it in functional style :) – vp_arth Jun 07 '16 at 22:43
  • Sorry for the offtopic question but what kind of assignment is this (c[id] = i, c) ? Destructuring? I think I've never seen. Thanks in advance! – Jose Hermosilla Rodrigo Jun 07 '16 at 22:51
  • 1
    no, it just `comma operator`, it means literally `(c, id, i) => {c[id] = i; return c;}` [Docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Comma_Operator) – vp_arth Jun 07 '16 at 22:54
  • I didn't know this usage!! Thanks, so much!! =) – Jose Hermosilla Rodrigo Jun 07 '16 at 22:59
  • @vp_arth The functional style is great, I didn't know about it! Thank you very much! – GnxR Jun 08 '16 at 14:04