0

Stack:

  • Mongo native driver 2.2
  • NodeJS v 6.11.3
  • Windows 10

If I launch more task for driver than his pool size in parallel, I can not get any response from my NodeJS http server.

Example:

I use async's lib async.each to call 50 000 inserts using mongo driver with pool size set to 1 While this tasks are running (slowly, cause we can have only one query running at the moment), I can not fetch any page from my server. I can't even see this request anywhere. The question is why?

FAQ

  1. I can increase pool size for driver or use async.eachSeries, if I want just to solve current situation.
  2. I have new Node so max sockets for agents are set to Infinity by default, so this brilliant article seems to be too old.
  3. My RAM and CPU are feeling fine all times. Disk I/O is OK too.
  4. I will gladly use read articles on this topic, but I really need someone to chew this for me

Here is some code to "try this at home"

const mongoUrl = "mongodb://******";
var db = require('./db'); //basic database workflows

const http = require("http");
const async = require('async');

db.connect(mongoUrl, (err, database) => {
    if (err) {
        console.log(err);
        process.exit(1);
    } else {
        server = http.createServer(function (request, response) {
            response.writeHead(200, { "Content-Type": "text/plain" });
            response.write("Hello World");
            response.end();
        }).listen(8888);

        console.log("running on 8888");



        var t = setTimeout(function () {
            //here we can see that after 30s node is still working alright
            console.log("Timeout fired in 30 seconds - I'm ok, I can fire timeouts!");
        }, 30 * 1000);


        //let's create a giant array for async.each
        var testArray = [];
        for (let i = 0; i < 60000; i++) {
            testArray[i] = Math.floor(Math.random() * 16777215).toString(16);
        }

        //clear the collection
        db.get().collection('someCollection').drop();

        async.each(testArray, function (artist, callback) {
            //insert to mongo
            db.get().collection('someCollection').
                insert({ "unimportant": artist },
                function (err, result) {
                    if (err) return callback(err);
                    return callback(null, "OK");
                });
        }, function (err, results) {
            if (err) return console.log(err);
            return console.log("somehow finished");
        });



        console.log("No blocking operations before this point - hooray!");

    }
});

db.js

var mongodb = require('mongodb');
var MongoClient = mongodb.MongoClient;
var ObjectID = mongodb.ObjectID;

var state = {
  db: null,
};

exports.connect = function(url, done) {
  if (state.db) return done();  

  MongoClient.connect(url, {
      poolSize:1,      //pool size
  }, function(err, db) {
    if (err) return done(err);
    state.db = db;
    state.discogsDB = db.db("discogs");    
    done();
  }); 
};

exports._id = function(id) {
  return new ObjectID(id);
};

exports.get = function() {
  return state.db;
};

exports.getD = function() {
  return state.discogsDB;
};

exports.close = function(done) {
  if (state.db) {
    state.db.close(function(err, result) {
      state.db = null;
      state.mode = null;
      done(err);
    });
  }
};
Amantel
  • 659
  • 1
  • 8
  • 18

1 Answers1

0

When running that code, Node.js takes up an entire CPU core, which almost always means that the event loop is being blocked.

Because async.each() runs the iteratee function in parallel, you're basically starting 60K concurrent queries. I think that the MongoDB driver gets overwhelmed by that, also because you're starving it with a very limited pool size.

Even when you would increase the pool size considerably, async.each() isn't the correct tool to use. A better solution, in between async.each() (parallel) and async.eachSeries() (sequentially) would be async.eachLimit(), with which you can manage the amount of concurrent "tasks" to run.

robertklep
  • 198,204
  • 35
  • 394
  • 381
  • Hi, robertklep! Why 60K concurrent queries overwhelm my driver? I thought Mongo was created to deal with situations like this. If I will have 100K users there will be chance for a situation like this even if I will try to put each of user request to series. My test stand have better CPU and RAM that future server, so that is a bit of an issue – Amantel Oct 19 '17 at 06:37
  • @Amantel it's not necessarily the amount of queries that you _start_ that is causing the problem, it's the pool size of just 1. It means that you are creating a deliberate bottleneck in your application. Use a decent pool size (at least 10, probably higher) and limit the amount of concurrent queries to about the same (using `async.eachLimit`) and see what sort of results that yields. – robertklep Oct 19 '17 at 07:44
  • True. I made it 3000, but and that helps, for a time. I saw at [link](https://docs.mongodb.com/manual/administration/production-notes/) mongo production docs that it should be set to 100-115% of average requests, but event if I set it that high, I am starting to have 100% CPU load when, for example, node stops serving static pages. – Amantel Oct 19 '17 at 07:46
  • @Amantel the pool size is 3000, or the number of concurrent queries? – robertklep Oct 19 '17 at 07:49
  • The pool size is 3000. I am doing a lot of async quires, but I try to keep concurrent quires under 1000. I have ~100% CPU load (~590% for my 6 cores) and at mongostats **conn** value is slowly growing from 1-3 to 549, then stops. My point is, that mongo should not stain CPU that much in any case. – Amantel Oct 19 '17 at 07:53
  • As a point of interest - I find it strange seeing so many mongo PIDs (https://pasteboard.co/GPCCbWB.png), when I should have one connection and a big pool. – Amantel Oct 19 '17 at 08:01
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/157050/discussion-between-amantel-and-robertklep). – Amantel Oct 19 '17 at 08:25
  • Chat discussion failed, so here is some additional info: with db.serverStatus - { "current" : 197, "available" : 51003, "totalCreated" : 1973 } I got 100% CPU load. Very strange. And with maximum of 3000 poolsize I see at mongostat 500 conn and again 100% CPU load – Amantel Oct 25 '17 at 09:57