$skip and $limit in aggregation framework

Question

When I read the document I found the following notes:

When a $sort immediately precedes a $limit in the pipeline, the $sort operation only maintains the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory. This optimization still applies when allowDiskUse is true and the n items exceed the aggregation memory limit.

If I'm right about this, it applies only when I use the $sort and $limit together like

db.coll.aggregate([
    ...,
    {$sort: ...},
    {$limit: limit},
    ...
]);

However, I think most of the time we would have

db.coll.aggregate([
    ...,
    {$sort: ...},
    {$skip: skip},
    {$limit: limit},
    ...
]);

Question 1: Does it mean the rule above doesn't apply if I use $skip here?

I ask this question because theoretically MongoDB can still calculate the top n records and enhance performance by sorting only top n records. I didn't find any document about this though. And if the rule doesn't apply,

Question 2: Do I need to change my query to the following to enhance performance?

db.coll.aggregate([
    ...,
    {$sort: ...},
    {$limit: skip + limit},
    {$skip: skip},
    {$limit: limit},
    ...
]);

EDIT: I think explains my use case would make the question above makes more sense. I'm using the text search feature provided by MongoDB 2.6 to look for products. I'm worried if the user inputs a very common key word like "red", there will be too many results returned. Thus I'm looking for better ways to generate this result.

EDIT2: It turns out that the last code above equals to

db.coll.aggregate([
    ...,
    {$sort: ...},
    {$limit: skip + limit},
    {$skip: skip},
    ...
]);

Thus I we can always use this form to make the top n rule apply.

score 88 · Accepted Answer · edited Dec 14 '19 at 12:31

88

Since this is a text search query we are talking about then the most optimal form is this:

db.collection.aggregate([
    { 
       "$match": {
               "$text": { "$search": "cake tea" }
    }
    },
    { "$sort": { "score": { "$meta": "textScore" } } },
    { "$limit": skip + limit },
    { "$skip": skip }
])

The rationale on the memory reserve from the top "sort" results will only work within it's own "limits" as it were and this will not be optimal for anything beyond a few reasonable "pages" of data.

Beyond what is reasonable for memory consumption, the additional stage will likely have a negative effect rather than positive.

These really are the practical limitations of the text search capabilities available to MongoDB in the current form. But for anything more detailed and requiring more performance, then just as is the case with many SQL "full text" solutions, you are better off using an external "purpose built" text search solution.

edited Dec 14 '19 at 12:31

Legna

460
6
19

answered Jun 11 '14 at 11:11

Neil Lunn

148,042
36
346
317

You say in the current form. Is there work underway to enhance MongoDB text search, do you know? There are some great comments here on using Solr in conjunction with MongoDB http://stackoverflow.com/questions/3215029/nosql-mongodb-vs-lucene-or-solr-as-your-database, – John Powell Jun 11 '14 at 11:19
@JohnBarça The answer you seek is actually more "official" and slightly loaded in nature. IMO MongoDB admittedly does not try to be an "optimal" key/value store nor does it try to implement every feature of a traditional relational system as a "database" goes. The extension of this is that a general purpose "database" generally does not "go in for" specialized areas such as "text search". But that is an opinion, and perspectives are often subject to change. By all means, use what works best. – Neil Lunn Jun 11 '14 at 11:35
Interesting. I have been dabbling in Mongo and really like certain features. But I hear what you are saying. I'm a GIS guy and I like the geojson stuff that has been done and the aggregation spatial enhancements, but in terms of functionality still a long way from being able to leave Postgres/Postgis. I accept this is a very niche area, though. – John Powell Jun 11 '14 at 12:09
@JohnBarça I agree with you guys. This is only a temp solution with which I can do it quick and simple. We did think of integrating a search engine. But not until the next phase because it would have brought too much extra work load now. And it has been much better than the "like" search we are using now：) – yaoxing Jun 11 '14 at 13:34
1

I kept thinking about this and I think I understood why MongoDB only allow $limit other than $skip to apply the **top n** rule. Because skip+limit can always be turned to limit+skip. I edit my question. – yaoxing Jun 12 '14 at 02:03
@NeilLunn You have started with the statement that, "since it is a text search we are talking about" - but to clarify - the method to immediately follow up "limit" after "sort" is the optimal way in order to minimise memory consumption from the aggregate chain after sorting, however small amount of memory we may gain from it, independent of whether it is a text search or a normal match by unwinding an array. Is that correct? – Sundar Aug 17 '15 at 14:17
Actually this is the right way to do it, because we're skipping, so we must include the items we are skipping in the limit of the records, i'm such an idiot, thank you good guy, you saved my day :] – moolsbytheway Dec 03 '17 at 20:14
@NeilLunn what would be the possible solution for this, https://stackoverflow.com/questions/59573513/skip-and-limit-for-pagination-for-a-mongo-aggregate – KcH Jan 03 '20 at 09:42
So... why aren't we suggesting to first skip, and then limit? It's by far the most clear and simple solution: [ ..., { $skip : skip }, { $limit : limit } ] -- it's more logical, and doesn't entangle variables of the separate steps in the pipeline.. – Ciabaros Aug 20 '20 at 19:19
skip before limit – Fayaz May 01 '23 at 10:53

score 13 · Answer 2 · answered Feb 15 '22 at 20:18

Answer: $skip before $limit

The order of $skip and $limit definitely matters, for aggregations at least. I just tried this, I don't know how it was missed, maybe it has changed since the op but I thought I would share.

I agree with @vkarpov15's comment in this conversation

In aggregate, $limit limits the number of documents sent to the next aggregation state, and $skip skips the first N documents, so if $skip is after $limit and $skip >= $limit, you won't get any results. In short, this is expected behavior in MongoDB.

Wow thank you! I was struggling with `skip`, `limit` and was wondering why on earth MongoDB is behaving like that (limit = limit + skip) ! By switching the order now limit is behaving as I expected. — Positivity, Jul 01 '22 at 15:25

score 9 · Answer 3 · answered Oct 12 '20 at 00:39

I found that it seems the sequence of limit and skip is immaterial. If I specify skip before limit, the mongoDB will make limit before skip under the hood.

> db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
{
    "op" : "command",
    "ns" : "archiprod.userinfos",
    "command" : {
        "aggregate" : "userinfos",
        "pipeline" : [
            {
                "$sort" : {
                    "updatedAt" : -1
                }
            },
            {
                "$limit" : 625
            },
            {
                "$skip" : 600
            }
        ],
    },
    "keysExamined" : 625,
    "docsExamined" : 625,
    "cursorExhausted" : true,
    "numYield" : 4,
    "nreturned" : 25,
    "millis" : 25,
    "planSummary" : "IXSCAN { updatedAt: -1 }",
    /* Some fields are omitted */
}

What happens if I swtich $skip and $limit? I got the same result in terms of keysExamined and docsExamined.

> db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
{
    "op" : "command",
    "ns" : "archiprod.userinfos",
    "command" : {
        "aggregate" : "userinfos",
        "pipeline" : [
            {
                "$sort" : {
                    "updatedAt" : -1
                }
            },
            {
                "$skip" : 600
            },
            {
                "$limit" : 25
            }
        ],
    },
    "keysExamined" : 625,
    "docsExamined" : 625,
    "cursorExhausted" : true,
    "numYield" : 5,
    "nreturned" : 25,
    "millis" : 71,
    "planSummary" : "IXSCAN { updatedAt: -1 }",
}

I then checked the explain result of the query. I found that totalDocsExamined is already 625 in the limit stage.

> db.userinfos.explain('executionStats').aggregate([ { "$sort" : { "updatedAt" : -1 } }, { "$limit" : 625 }, { "$skip" : 600 } ])
{
    "stages" : [
        {
            "$cursor" : {
                "sort" : {
                    "updatedAt" : -1
                },
                "limit" : NumberLong(625),
                "queryPlanner" : {
                    "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "updatedAt" : -1
                            },
                            "indexName" : "updatedAt_-1",
                        }
                    },
                },
                "executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 625,
                    "executionTimeMillis" : 22,
                    "totalKeysExamined" : 625,
                    "totalDocsExamined" : 625,
                    "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 625,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 625,
                        "advanced" : 625,
                        "docsExamined" : 625,
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "nReturned" : 625,
                            "works" : 625,
                            "advanced" : 625,
                            "keyPattern" : {
                                "updatedAt" : -1
                            },
                            "indexName" : "updatedAt_-1",
                            "keysExamined" : 625,
                        }
                    }
                }
            }
        },
        {
            "$skip" : NumberLong(600)
        }
    ]
}

And surprisingly, I found switching the $skip and $limit results in the same explain result.

> db.userinfos.explain('executionStats').aggregate([ { "$sort" : { "updatedAt" : -1 } }, { "$skip" : 600 }, { "$limit" : 25 } ])
{
    "stages" : [
        {
            "$cursor" : {
                "sort" : {
                    "updatedAt" : -1
                },
                "limit" : NumberLong(625),
                "queryPlanner" : {
                    /* Omitted */
                },
                "executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 625,
                    "executionTimeMillis" : 31,
                    "totalKeysExamined" : 625,
                    "totalDocsExamined" : 625,
                    /* Omitted */
                }
            }
        },
        {
            "$skip" : NumberLong(600)
        }
    ]
}

As you can see, even though I specified $skip before $limit, in the explain result, it's still $limit before $skip.

How to make $limit is unchangeable. Because, I need to test it in a loop and for every loop when I increased skip with a constant, that execution time is increased in each request(TTFB) because of $limit is always $skip + $limit. How we can solve that ? — O.k, Apr 27 '21 at 23:12

score 0 · Answer 4 · answered Jul 07 '22 at 17:54

In simple terms, i came to conclusion that it doesn't matter. Assuming no filters in the query, firstly skipping 10 docs, then limiting to 5 docs,the query, if explained, will return the last 5 docs and examine 15 docs in total. Kindly judge my analysis.

$skip and $limit in aggregation framework

4 Answers4

Answer: $skip before $limit

Linked

Related