Is db.inventory.find().limit(10)
faster than db.inventory.find()
?
I have millions of records in mongodb, I want to get top 10 records in some orders.
Is db.inventory.find().limit(10)
faster than db.inventory.find()
?
I have millions of records in mongodb, I want to get top 10 records in some orders.
Using limit()
you inform the server that you will not retrieve more than k documents. Allowing some optimizations to reduce bandwidth consumption and to speed-up sorts. Finally, using a limit clause the server will be able to better use the 32MB max available when sorting in RAM (i.e.: when sort order cannot be obtained from an index).
Now, the long story: find()
returns a cursor. By default, the cursor will transfer the results to the client in batches. From the documentation,:
For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes.
Using limit()
the cursor will not need to retrieve more documents than necessary. Thus reducing bandwidth consumption and latency.
Please notice that, given your use case, you will probably use a sort()
operation as well. From the same documentation as above:
For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.
And the sort() documentation page explains further:
If MongoDB cannot obtain the sort order via an index scan, then MongoDB uses a top-k sort algorithm. This algorithm buffers the first k results (or last, depending on the sort order) seen so far by the underlying index or collection access. If at any point the memory footprint of these k results exceeds 32 megabytes, the query will fail1.
1That 32 MB limitation is not specific to sort using a limit()
clause. Any sort whose order cannot be obtained from an index will suffer from the same limitation. However, with a plain sort the server need to hold all documents in its memory to sort them. With a limited sort, it only have to store k documents in memory at the same time.
if you need it in order then of course the DB would first sort it based on the criteria and then return the top 10 records. by using the limit you are just saving the network bandwidth. e.g. here I am sorting by name and then giving the top 10 records, it has to scan the whole data and then pick the top 10. (as you can notice its doing COLLSCAN which is understood for collection scan as I don't have the index for this example, the idea to show here is that its doing the full scan of all the records, sort it and then pick the top ones.)
> db.t1.find().sort({name:1}).limit(10).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.t1",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"name" : 1
},
"limitAmount" : 10,
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "Sachin-Mac.local",
"port" : 27017,
"version" : "3.0.2",
"gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f"
},
"ok" : 1
}