10

I want single random document from mongoDB collection. Now my mongoDB collection contains more then 1 billion collections. How to get single random document from that collection ?

Hitul Mistry
  • 2,105
  • 4
  • 21
  • 29

5 Answers5

21

I never worked with MongoDB from Python, but there is a general solution for your problem. Here is a MongoDB shell script for obtaining single random document:

N = db.collection.count(condition)
db.collection.find(condition).limit(1).skip(Math.floor(Math.random()*N))

condition here is a MongoDB query. If you want to query an entire collection, use query = null.

It's a general solution, so it works with any MongoDB driver.


Update

I ran a benchmark to test several implementations. First, I created test collection with 5567249 documents with indexed random field rnd.

I chose three methods to compare with each other:

First method:

db.collection.find().limit(1).skip(Math.floor(Math.random()*N))

Second method:

db.collection.find({rnd: {$gte: Math.random()}}).sort({rnd:1}).limit(1)

Third method:

db.collection.findOne({rnd: {$gte: Math.random()}})

I ran each method 10 times and got its average computing time:

method 1: 882.1 msec
method 2: 1.2 msec
method 3: 0.6 msec

This benchmark shows that my solution not the fastest one.

But the third solution is not a good one either, because it finds the first element in database (sorted in natural order) with rnd > random(). So, its output not truly random.

I think that second method is the best one for frequent usage. But it has one defect: it requires altering the whole database and ensuring additional index.

Leonid Beschastny
  • 50,364
  • 10
  • 118
  • 122
6

Add an additional column named random to your collection and make that the value in it is between 0 to 1. You can assign random floating points between 0 to 1 into this column for each record via [random.random() for _ in range(0, 10)].

Then:-

import random

collection = mongodb["collection_name"]

rand = random.random()  # rand will be a floating point between 0 to 1.
random_record = collection.find_one({ 'random' => { '$gte' => rand } })

MongoDB will have its native implementation in due course. Filed feature here - https://jira.mongodb.org/browse/SERVER-533

Not yet implemented at time of writing.

Calvin Cheng
  • 35,640
  • 39
  • 116
  • 167
  • 2
    You should not have to modify your data to do this. It might not even be your data! – will Nov 23 '12 at 07:34
  • We are not modifying the original data. We are adding a new column to it and generating a random floating point from 0 to 1, associated to the data. – Calvin Cheng Nov 23 '12 at 07:37
  • 1
    Adding a field to each document requires modifying each document, which is modifying the data. What if it is someone else's database you only have read access to? This is a read problem. You should not have to litter the dataset – will Nov 23 '12 at 07:39
  • 3
    will, I don't really agree. This answer is a good general one even if it doesn't fit every situation. Wikipedia, for example, uses this solution for their random page function. – Emil Vikström Nov 23 '12 at 07:50
  • 1
    This is the only performant way atm. – Eve Freeman Nov 23 '12 at 08:57
  • @CalvinCheng: you should also check for $lt, as you might have rand bigger than any 'random' field. – mrówa Nov 24 '12 at 03:22
  • @mrówa With a billion elements, the likelihood of this happening is almost none. If you want to be sure that you'll always get something, set one random value to 1.0 (of course, this might very slightly skew the results). – Eve Freeman Nov 24 '12 at 04:01
  • @Wes: it doesn't hurt to add check for nullity of random_record & $lt, as the $lt won't fire until such a case is found. I understand the point that it might not happen at all. But why not checking it if it's just that simple? And creating custom solution to such simple case? Why even bother? – mrówa Nov 24 '12 at 21:37
  • Btw, you do need to sort/limit your results to achieve truly random selection. I found this out the hard way today (after having implemented this weeks ago). – Eve Freeman Nov 26 '12 at 10:25
6

Since MongoDB 3.2, it can be done using aggregate function with $sample operator, as described in docs. It's super fast. Following code will randomly select 20 documents from collection.

db.collection.aggregate( [ { $sample: {size: 20} } ] )

if you need to select random documents with specific criteria, you can use it with $match opperator

db.collection.aggregate([ 
    { $sample: {size: 20} }, 
    { $match:{"yourField": value} } 
  ])

beware of the order! If I search in my small database around 100k documents, this command above takes 15ms, while when you switch the order, it's 1750ms (more then 100x times slower). The reason is obvious of course. Additionally, with this order you get subset of those random 20 documents...

kotrfa
  • 1,191
  • 15
  • 22
  • I'm new to mongo so apologies in advance for the stupid question: If I used `$sample: {size: 1}` how would I then select only a single key from that random record? – RoyalTS May 04 '17 at 23:18
2

In a performant manner? It is hard, to say the least, without changing your data.

Imagine you try and get a rand() of 1,000,000 from 1b documents. That will be slow, very slow. This is because MongoDB does not make effective use of indexes when skipping.

As @Calvin said, MongoDB has a feature request to get random documents however it is not yet implemented.

The most performant way of doing this, atm if you were to do this regularly, is to add a auto incrementing id to your records: http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field and use that to rand() on.

Edit

To clarify; when using the auto incrementing id you will need to do one query initially (unless you keep track of it another way) to get the highest value of the field. You can either query the counter collection or the collection itself and sort in reverse (sort({field:-1})) and limit(1) to get the highest value for rand().

You also need to take into account changes in data which means you actually want the $gte of that random position.

My idea can be explained more here: php mongodb find nth entry in collection

Community
  • 1
  • 1
Sammaye
  • 43,242
  • 7
  • 104
  • 146
1

If your objects have int id's on them you could do something like

findOne({id: {$gte: rand()}}) 
Errol Fitzgerald
  • 2,978
  • 2
  • 26
  • 34