1

I want to retrieve the last 20 documents in my large collection in an efficient manner.

This SO post offered this performant solution - but it does not answer my question because my question is specifically dealing with _id index - :

db.collectionName.find().min(minCriteria).hint(yourIndex).limit(N)

However, my collection just contains the default index (_id). I'm just not sure what min criteria would be - I obviously don't want to hardcode an _id value, as the collection is periodically emptied.

itemsCollection.find().min(<minCriteria>).hint({_id:1}).limit(20)

Is there any way to use min with the _id index? Or is my only option creating a new index?

maddie
  • 1,854
  • 4
  • 30
  • 66
  • What is the MongoDB version you are using? Is it a cluster (replica-set, sharded) or a standalone? – prasad_ Mar 14 '20 at 02:20
  • @prasad_ three node replica-set – maddie Mar 14 '20 at 03:19
  • The [docs](https://docs.mongodb.com/manual/reference/method/cursor.min/) say "min() exists primarily to support the mongos process"; so what is your use case? Are you looking for performance (_"I want to retrieve the last 20 documents in my large collection in an efficient manner."_)? What is the functionality you are looking for? – prasad_ Mar 14 '20 at 03:43
  • most efficient manner – maddie Mar 14 '20 at 03:50
  • 1
    can you run `db.collection.find({}).sort({ _id: -1 }).limit(20).explain("executionStats")` and tell us what value you get for `executionTimeMillis` – Dĵ ΝιΓΞΗΛψΚ Mar 14 '20 at 04:10
  • With the explain result, you will see something like this: `"nReturned" : 20, "executionTimeMillis" : 0 (can be different), "totalKeysExamined" : 20, "totalDocsExamined" : 20,...` - which is what you want, I think. Note, the sort is performed using the available index. – prasad_ Mar 14 '20 at 04:16
  • @prasad_ so is sorting by id index and limiting not a good way to get the last n records? – Dĵ ΝιΓΞΗΛψΚ Mar 14 '20 at 04:17
  • @Đĵ ΝιΓΞΗΛψΚ It works fine; I can see it in the plan results (I have in my previous comment). That is good. – prasad_ Mar 14 '20 at 04:38

1 Answers1

0

Yes, you can use min with the _id index, as long as your <minCriteria> only reference the _id field.

If your min criteria is on something other than _id, you will need to create an index on that criteria in order to avoid this query being a full collection scan.

The min() cursor method is for establishing a lower bound for the index scan that will service the query. This is probably not what you are looking for to retrieve the most recently added documents.

Assuming each document's _id field contains an ObjectId or some other value that sorts in the order they were inserted, then you can, as noted in the comments, do a reverse sort on _id and limit to the number of documents desired, which can be very efficient.

This query should automatically use the _id index:

db.itemsCollection.find().sort({_id:-1}).limit(20)

The date part of the ObjectId is determined by the system creating the value, which in some cases is a client/application server. This means that clock drift may affect the ordering.

If you want to get the documents that were most recently inserted into the collection, you can use natural order:

db.itemsCollection.find().sort({$natural:-1}).limit(20)

This doesn't use an index, but it should still be fairly performant because it will only scan the number of documents you want to return.

Joe
  • 25,000
  • 3
  • 22
  • 44
  • Could you provide an example of min criteria? – maddie Mar 13 '20 at 22:40
  • That depends on the documents in the collection, and you'll have to specify the criteria that matches the index you want to use. See https://docs.mongodb.com/manual/reference/method/cursor.min/index.html#cursor-min – Joe Mar 13 '20 at 22:43
  • Sorry my question didn't make, my original post explained that min criteria would be - I obviously don't want to hardcode an _id value, as the collection is periodically emptied. And so I'm wondering if its possible to use min on a collection only indexed with _id - and furthermore, what the min criteria would be in this case? Would min criteria not just be a hardcoded _id? Which doesn't seem performant – maddie Mar 14 '20 at 03:15
  • I should have recognized the XY problem straight off. I'll edit the question to address the main objective. – Joe Mar 14 '20 at 07:55