112

Bit of an odd one on query performance... I need to run a query which does a total count of documents, and can also return a result set that can be limited and offset.

So, I have 57 documents in total, and the user wants 10 documents offset by 20.

I can think of 2 ways of doing this, first is query for all 57 documents (returned as an array), then using array.slice return the documents they want. The second option is to run 2 queries, the first one using mongo's native 'count' method, then run a second query using mongo's native $limit and $skip aggregators.

Which do you think would scale better? Doing it all in one query, or running two separate ones?

Edit:

// 1 query
var limit = 10;
var offset = 20;

Animals.find({}, function (err, animals) {
    if (err) {
        return next(err);
    }

    res.send({count: animals.length, animals: animals.slice(offset, limit + offset)});
});


// 2 queries
Animals.find({}, {limit:10, skip:20} function (err, animals) {            
    if (err) {
        return next(err);
    }

    Animals.count({}, function (err, count) {
        if (err) {
            return next(err);
        }

        res.send({count: count, animals: animals});
    });
});
leepowell
  • 3,838
  • 8
  • 27
  • 35
  • I am unsure about Mongoose however the default `count()` function in PHP does not take `limit` or `skip` into account unless told to so just running one query of limit and skip and then getting the count should give the most performant solution here probably. However how will you kow there are 57 documents if you don't do two queries to count what is currently there? Do you have a static number that never changes? If not then you will need to do both the skip and limit then the count. – Sammaye Dec 18 '12 at 15:06
  • Sorry, I was talking about using Mongo's native count method `db.collection.find().count();` – leepowell Dec 18 '12 at 15:25
  • Sorry it was me, I misread your question. Hmmm actually I am not sure which would be better, will your result set always be really low like 57 docs? If so then client side slice might be a milisecond more performant. – Sammaye Dec 18 '12 at 15:28
  • I've added example to the original question, I don't think the data will ever get as high as 10,000+ but potentially it could. – leepowell Dec 18 '12 at 15:31
  • At 10k records you *could* see the memory handling of JS be less performant than the `count()` function of MongoDB. The `count()` function in MongoDB is relatively slow but it is still pretty much as fast as most client side variations on larger sets and it could be faster than client side counting here possibly. But that part is subjective to your own testing. Mind you I have counted 10k length arrays easily before so it might be faster client side, it is very hard to say at 10k elements. – Sammaye Dec 18 '12 at 15:35
  • This seems rather subjective and likely depends on the size of your documents. I would try benchmarking with some representative data to see if there is a noticeable difference for your use case. If the user is working with a paginated set of data, there may also be some benefit in slicing/caching the array in the application session to reduce round trips to a remote server. It's also worth noting that with two queries and active updates on a collection, you have potential for the actual count to change between doing the count() and find() queries. – Stennie Jan 01 '13 at 05:48

7 Answers7

175

I suggest you to use 2 queries:

  1. db.collection.count() will return total number of items. This value is stored somewhere in Mongo and it is not calculated.

  2. db.collection.find().skip(20).limit(10) here I assume you could use a sort by some field, so do not forget to add an index on this field. This query will be fast too.

I think that you shouldn't query all items and than perform skip and take, cause later when you have big data you will have problems with data transferring and processing.

TechWisdom
  • 3,960
  • 4
  • 33
  • 40
user854301
  • 5,383
  • 3
  • 28
  • 37
  • 1
    What I'm writting is just a comment without any pretention but I've heard that the `.skip()` instruction is heavy for the CPU because it goes to the beginning of the collection and get to the value specified in the parameter of `.skip()`. It can has a real impact on big collection! But I don't know which one is the most heavy between use `.skip()` anyway or get the whole collection and trim with JS ... What do you think ? – Zachary Dahan Apr 24 '15 at 23:13
  • 2
    @Stuffix I've heard the same concerns over using `.skip()`. This [answer](http://stackoverflow.com/questions/5539955/how-to-paginate-with-mongoose-in-node-js/23640287#23640287) touches up on it, and advises to use a filter on a date field. One could use this with the `.skip()` & `.take()` methods. This seems like a good idea. However, I'm having trouble with this OP's question on how to get a count of total documents. If a filter is used to combat the performance implications of `.skip()`, how can we have an accurate count? The count stored in the db won't reflect our filtered data set. – Michael Leanos Mar 14 '16 at 02:31
  • Hi @MichaelLeanos , I am facing the same issue: i.e. how to get a count of total documents. If a filter is used then how can we have an accurate count? Did you get the solution for this? – virsha Feb 01 '17 at 06:15
  • @virsha, use `cursor.count()` to return number of filtered documnets (it will not execute query it will return you number of matched docs). Make sure that you filter and order properties are indexed and everything will be fine. – user854301 Feb 01 '17 at 07:24
  • @virsha Using the `cursor.count()` should work as @user854301 pointed out. However, what I ended up doing was adding an endpoint to my API (`/api/my-colllection/stats`) that I used to return various stats on my collections using Mongoose's [db.collection.stats](https://docs.mongodb.com/v3.2/reference/method/db.collection.stats/) feature. Since I really only needed this for my front-end, I just queried the endpoint to return that info independently of my server-side pagination. – Michael Leanos Feb 02 '17 at 20:46
31

Instead of using 2 separate queries, you can use aggregate() in a single query:

Aggregate "$facet" can be fetch more quickly, the Total Count and the Data with skip & limit

    db.collection.aggregate([

      //{$sort: {...}}

      //{$match:{...}}

      {$facet:{

        "stage1" : [ {"$group": {_id:null, count:{$sum:1}}} ],

        "stage2" : [ { "$skip": 0}, {"$limit": 2} ]
  
      }},
     
     {$unwind: "$stage1"},
  
      //output projection
     {$project:{
        count: "$stage1.count",
        data: "$stage2"
     }}

 ]);

output as follows:-

[{
     count: 50,
     data: [
        {...},
        {...}
      ]
 }]

Also, have a look at https://docs.mongodb.com/manual/reference/operator/aggregation/facet/

DhineshYes
  • 1,008
  • 11
  • 12
6
db.collection_name.aggregate([
    { '$match'    : { } },
    { '$sort'     : { '_id' : -1 } },
    { '$facet'    : {
        metadata: [ { $count: "total" } ],
        data: [ { $skip: 1 }, { $limit: 10 },{ '$project' : {"_id":0} } ] // add projection here wish you re-shape the docs
    } }
] )

Instead of using two queries to find the total count and skip the matched record.
$facet is the best and optimized way.

  1. Match the record
  2. Find total_count
  3. skip the record
  4. And also can reshape data according to our needs in the query.
SANJEEV RAVI
  • 159
  • 1
  • 6
4

There is a library that will do all of this for you, check out mongoose-paginate-v2

Dev01
  • 13,292
  • 19
  • 70
  • 124
  • It never occurred to me that there would be such a library. I saw the comment after designing the whole structure. Sad day :( – HyopeR Apr 25 '22 at 16:17
3

After having to tackle this issue myself, I would like to build upon user854301's answer.

Mongoose ^4.13.8 I was able to use a function called toConstructor() which allowed me to avoid building the query multiple times when filters are applied. I know this function is available in older versions too but you'll have to check the Mongoose docs to confirm this.

The following uses Bluebird promises:

let schema = Query.find({ name: 'bloggs', age: { $gt: 30 } });

// save the query as a 'template'
let query = schema.toConstructor();

return Promise.join(
    schema.count().exec(),
    query().limit(limit).skip(skip).exec(),

    function (total, data) {
        return { data: data, total: total }
    }
);

Now the count query will return the total records it matched and the data returned will be a subset of the total records.

Please note the () around query() which constructs the query.

oli_taz
  • 197
  • 1
  • 4
  • 18
3

You don't have to use two queries or one complicated query with aggregate and such.

You can use one query

example:

const getNames = async (queryParams) => {

  const cursor = db.collection.find(queryParams).skip(20).limit(10);
  return {
    count: await cursor.count(),
    data: await cursor.toArray()
  }
  
}

mongo returns a cursor that has predefined functions such as count, which will return the full count of the queried results regardless of skip and limit

So in count property, you will get the full length of the collection and in data, you will get just the chunk with offset of 20 and limit of 10 documents

2

Thanks Igor Igeto Mitkovski, a best solution is using native connection

document is here: https://docs.mongodb.com/manual/reference/method/cursor.count/#mongodb-method-cursor.count and mongoose dont support it ( https://github.com/Automattic/mongoose/issues/3283 )

we have to use native connection.

const query = StudentModel.collection.find(
    {
       age: 13
    }, 
    {
       projection:{ _id:0 }
    }
    ).sort({ time: -1 })
const count = await query.count()
const records = await query.skip(20)
          .limit(10).toArray()
Baryon Lee
  • 1,157
  • 11
  • 11