2

Performance of individual findOne query is abnormally slow (upwards of 60-85ms). Is there something fundamentally wrong with the design below? What steps should I take to make this operation faster?

Goal (fast count of items within a range, under 10-20ms):

  • Input max and min time
  • Query database for document with closest time for max and min
  • Return the "number" field of both query result
  • Take the difference of the "number" field to get document count

Setup

MongoDB database

3000 documents, compound ascending index on time_axis, latency_axis, number field

[   {   time_axis:1397888153982,latency_axis:5679,number:1},    
    {   time_axis:1397888156339,latency_axis:89 ,number:2}, 

                ...     
    {   time_axis:1398036817121,latency_axis:122407,number:2999},   
    {   time_axis:1398036817122,latency_axis:7149560,number:3000}   ]

NodeJs

exports.getCount = function (uri, collection_name, min, max, callback) {
    var low, high;
    var start = now();
    MongoClient.connect(uri, function(err, db) {
        if(err) {
            return callback(err, null);
        }
        var collection = db.collection(collection_name);
        async.parallel([
            function findLow(callback){
                var query = {time_axis : { $gte : min}};
                var projection = { _id: 0, number: 1};
                collection.findOne( query, projection, function(err, result) {
                    low = result.number;
                    console.log("min query time: "+(now()-start));
                    callback();
                });
            },
            function findHigh(callback){
                var query = {time_axis : { $gte : max}};
                var projection = { _id: 0, number: 1};
                collection.findOne( query, projection, function(err, result) {
                    high = result.number;
                    console.log("max query time: "+(now()-start));
                    callback();
                });
            }
        ], 
        function calculateCount ( err ){ 
            var count = high - low;
            db.close();
            console.log("total query time: "+(now()-start));
            callback(null, count);
        });
    });
}

Note: Thank you for Adio for the answer. It turns out mongodb connection only need to be initialized once and handles connection pooling automatically. :)

Community
  • 1
  • 1
jackluo923
  • 105
  • 1
  • 7
  • Where is the `mongodb` located? Locally or remote? – majidarif Jul 24 '14 at 05:18
  • 1
    I am not nodeJS developer but I see that you create a connection for every query, can you try to share the connection ? connection to the DB take time :) – Adelin Jul 24 '14 at 05:23
  • @majidarif Mongodb is located locally: 127.0.0.1 – jackluo923 Jul 24 '14 at 05:24
  • What @Adio said, and try using connection pools. – majidarif Jul 24 '14 at 05:30
  • Also running them in parallel doesn't really make them run in parallel. Remember that node runs on a single thread. Try only doing one query and see if there is a difference. Though I doubt there will be any. – majidarif Jul 24 '14 at 05:31
  • 1
    @Adio You are absolutely right! Taking the connection time out of the scope, each individual query takes 12-13ms. Mongodb handles the requests in parallel thus the total query time for both results is also 13ms. :) – jackluo923 Jul 24 '14 at 05:43
  • @majidarif Although node is single core, it can handle parallelism better because its asynchronous design. :) Thanks for suggesting "connection pool" idea. – jackluo923 Jul 24 '14 at 05:44
  • Glad to hear that I was useful :) – Adelin Jul 24 '14 at 06:04
  • I will provide my suggestion as an answer so that others can see it :) – Adelin Jul 24 '14 at 06:04

2 Answers2

0

Try to use the --prof option in NodeJs to generate profiling results and you can find out where it spent time on. eg. node --prof app.js

you will get a v8.log file containing the profiling results. The tool for interpreting v8.log is linux-tick-processor, which can be found within the v8 project v8/tools/linux-tick-processor.

To obtain linux-tick-processor:

git clone https://github.com/v8/v8.git
cd v8 
make -j8 ia32.release
export D8_PATH=$PWD/out/ia32.release
export PATH=$PATH:$PWD/tools

linux-tick-processor /path/to/v8.log | vim -
majidarif
  • 18,694
  • 16
  • 88
  • 133
  • You do realize if mongoDB query is slow this won't tell us about it, because in node that would be asynchronous network call, and profiling will only show us problems that takes a lot of CPU time. – Farid Nouri Neshat Jul 24 '14 at 05:49
0

Looking at your source code, I can see that you create a new connection every time you query MongoDB. Try provide an already created connection and thus reuse the connection you create. Coming from Java world I think you should create some connection pooling.

You can also check this question and its answer.

" You open do MongoClient.connect once when your app boots up and reuse the db object. It's not a singleton connection pool each .connect creates a new connection pool. " 
Community
  • 1
  • 1
Adelin
  • 18,144
  • 26
  • 115
  • 175