7

I need to pick a document from a collection at random (alternatively - a small number of successive documents from a randomly-positioned "window"). I've found two solutions: 1 and 2. The first is unacceptable since I anticipate large collection size and wish to minimize the document size. The second seems ineffective (I'm not sure about the complexity of skip operation). And here one can find a mention of querying a document with a specified index, but I don't know how to do it (I'm using C++ driver).

Are there other solutions to the problem? Which is the most efficient?

Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
  • There is a [feature request to get random items from a collection](https://jira.mongodb.org/browse/SERVER-533) in the MongoDB ticket tracker. If implemented natively, it would likely be the most efficient option. (If you want the feature, go vote it up.) – David J. Jun 17 '12 at 02:31
  • 1
    This question has been asked in many forms here on Stack Overflow. The most popular question is [Random record from MongoDB](http://stackoverflow.com/questions/2824157/random-record-from-mongodb) -- it has good responses. That said, I think the best way of thinking about the question is not to think about getting one random document but, rather, randomizing a result set. See [Ordering a result set randomly in Mongo](http://stackoverflow.com/questions/8500266/ordering-a-result-set-randomly-in-mongo) for that. – David J. Jun 17 '12 at 02:43

2 Answers2

1

It seems like you could mold solution 1 there, (assuming your _id key was an auto-inc value), then just do a count on your records, and use that as the upper limit for a random int in c++, then grab that row.

Likewise, if you don't have an autoinc _id key, just create one with your results.. having an additional field with an INT shouldn't add that much to your document size.

If you don't have an auto-inc field Mongo talks about how to quickly add one here:

Auto Inc Field.

Petrogad
  • 4,405
  • 5
  • 38
  • 77
  • I'm not sure whether I have an autoinc _id or not. I was hoping to avoid it. My document has ID field, and I'm doing `ensureIndex` for that field every time I insert a new doc. I'm new to Mongo, so I can't really tell. – Violet Giraffe Nov 10 '11 at 14:46
  • Is it possible to query a document not with matching index, but with the closest index to the one I've specified? It should be as fast as a usual query by index, and it solves my problem. – Violet Giraffe Nov 10 '11 at 14:48
  • the thing with Mongo's ID's are that if you're using the default MongoID (which it generates) they are using BSON's Object model: http://www.mongodb.org/display/DOCS/Object+IDs . You can override this though by creating your own set of _id's on initial document creation, just need to confirm they are always unique. – Petrogad Nov 10 '11 at 15:48
  • 1
    Updated the question about the auto inc field which mongo talks about how to add. Hope this helps! – Petrogad Nov 10 '11 at 15:50
1

I had a similar issue once. In my case, I had a date property on my documents. I knew the earliest date possible in the dataset so in my application code, I would generate a random date within the range of EARLIEST_DATE_IN_SET and NOW and then query mongodb using a GTE query on the date property and simply limit it to 1 result.

There was a small chance that the random date would be greater than the highest date in the data set, so i accounted for that in the application code.

With an index on the date property, this was a super fast query.

Bryan Migliorisi
  • 8,982
  • 4
  • 34
  • 47
  • Thanks, I went with this option. I have just profiled my application, and I wish all Mongo accesses are as fast as picking a random document with your method :) – Violet Giraffe Nov 10 '11 at 21:00