I want single random document from mongoDB collection. Now my mongoDB collection contains more then 1 billion collections. How to get single random document from that collection ?
-
`random.randrange(NUM_FILES)` – Joel Cornett Nov 23 '12 at 07:28
5 Answers
I never worked with MongoDB from Python, but there is a general solution for your problem. Here is a MongoDB shell script for obtaining single random document:
N = db.collection.count(condition)
db.collection.find(condition).limit(1).skip(Math.floor(Math.random()*N))
condition
here is a MongoDB query. If you want to query an entire collection, use query = null
.
It's a general solution, so it works with any MongoDB driver.
Update
I ran a benchmark to test several implementations. First, I created test collection with 5567249 documents with indexed random field rnd
.
I chose three methods to compare with each other:
First method:
db.collection.find().limit(1).skip(Math.floor(Math.random()*N))
Second method:
db.collection.find({rnd: {$gte: Math.random()}}).sort({rnd:1}).limit(1)
Third method:
db.collection.findOne({rnd: {$gte: Math.random()}})
I ran each method 10 times and got its average computing time:
method 1: 882.1 msec
method 2: 1.2 msec
method 3: 0.6 msec
This benchmark shows that my solution not the fastest one.
But the third solution is not a good one either, because it finds the first element in database (sorted in natural order) with rnd > random()
. So, its output not truly random.
I think that second method is the best one for frequent usage. But it has one defect: it requires altering the whole database and ensuring additional index.

- 50,364
- 10
- 118
- 122
-
-
5I decided to run a benchmark to test it. I'll post my results here. – Leonid Beschastny Nov 23 '12 at 10:33
-
"But the third solution is not a good one either, because it finds the first element in database" natural order and find order are two different things. find order is actually mostly random in fact, not totally but it does have an element of randomness about it – Sammaye Nov 23 '12 at 17:47
-
+1 #2 is the `One True Way` until mongodb adds a good way to do this out of the box. – Eve Freeman Nov 26 '12 at 10:24
-
Problem is that computer will take some time to generate random number. – Hitul Mistry Nov 19 '13 at 06:05
Add an additional column named random
to your collection and make that the value in it is between 0 to 1. You can assign random floating points between 0 to 1 into this column for each record via [random.random() for _ in range(0, 10)]
.
Then:-
import random
collection = mongodb["collection_name"]
rand = random.random() # rand will be a floating point between 0 to 1.
random_record = collection.find_one({ 'random' => { '$gte' => rand } })
MongoDB will have its native implementation in due course. Filed feature here - https://jira.mongodb.org/browse/SERVER-533
Not yet implemented at time of writing.

- 35,640
- 39
- 116
- 167
-
2You should not have to modify your data to do this. It might not even be your data! – will Nov 23 '12 at 07:34
-
We are not modifying the original data. We are adding a new column to it and generating a random floating point from 0 to 1, associated to the data. – Calvin Cheng Nov 23 '12 at 07:37
-
1Adding a field to each document requires modifying each document, which is modifying the data. What if it is someone else's database you only have read access to? This is a read problem. You should not have to litter the dataset – will Nov 23 '12 at 07:39
-
3will, I don't really agree. This answer is a good general one even if it doesn't fit every situation. Wikipedia, for example, uses this solution for their random page function. – Emil Vikström Nov 23 '12 at 07:50
-
1
-
@CalvinCheng: you should also check for $lt, as you might have rand bigger than any 'random' field. – mrówa Nov 24 '12 at 03:22
-
@mrówa With a billion elements, the likelihood of this happening is almost none. If you want to be sure that you'll always get something, set one random value to 1.0 (of course, this might very slightly skew the results). – Eve Freeman Nov 24 '12 at 04:01
-
@Wes: it doesn't hurt to add check for nullity of random_record & $lt, as the $lt won't fire until such a case is found. I understand the point that it might not happen at all. But why not checking it if it's just that simple? And creating custom solution to such simple case? Why even bother? – mrówa Nov 24 '12 at 21:37
-
Btw, you do need to sort/limit your results to achieve truly random selection. I found this out the hard way today (after having implemented this weeks ago). – Eve Freeman Nov 26 '12 at 10:25
Since MongoDB 3.2
, it can be done using aggregate
function with $sample
operator, as described in docs. It's super fast. Following code will randomly select 20 documents from collection.
db.collection.aggregate( [ { $sample: {size: 20} } ] )
if you need to select random documents with specific criteria, you can use it with $match
opperator
db.collection.aggregate([
{ $sample: {size: 20} },
{ $match:{"yourField": value} }
])
beware of the order! If I search in my small database around 100k documents, this command above takes 15ms, while when you switch the order, it's 1750ms (more then 100x times slower). The reason is obvious of course. Additionally, with this order you get subset of those random 20 documents...

- 1,191
- 15
- 22
-
I'm new to mongo so apologies in advance for the stupid question: If I used `$sample: {size: 1}` how would I then select only a single key from that random record? – RoyalTS May 04 '17 at 23:18
In a performant manner? It is hard, to say the least, without changing your data.
Imagine you try and get a rand() of 1,000,000 from 1b documents. That will be slow, very slow. This is because MongoDB does not make effective use of indexes when skipping.
As @Calvin said, MongoDB has a feature request to get random documents however it is not yet implemented.
The most performant way of doing this, atm if you were to do this regularly, is to add a auto incrementing id to your records: http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field and use that to rand()
on.
Edit
To clarify; when using the auto incrementing id you will need to do one query initially (unless you keep track of it another way) to get the highest value of the field. You can either query the counter collection or the collection itself and sort in reverse (sort({field:-1})
) and limit(1)
to get the highest value for rand()
.
You also need to take into account changes in data which means you actually want the $gte
of that random position.
My idea can be explained more here: php mongodb find nth entry in collection
If your objects have int id's on them you could do something like
findOne({id: {$gte: rand()}})

- 2,978
- 2
- 26
- 34
-
3While this certainly will give a random document, it may not have a uniform distribution. – Emil Vikström Nov 23 '12 at 07:32
-
This would be a reasonable solution for many situations (given the integer IDs constraint). – Andy Triggs Feb 27 '14 at 11:07