16

I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that field as 1. I need to do this sampling 1000 times. How do i do it ?

Community
  • 1
  • 1
Aditya Singh
  • 822
  • 3
  • 15
  • 25

5 Answers5

23

For people coming to the answer, you should now use the new $sample aggregation function, new in 3.2.

https://docs.mongodb.org/manual/reference/operator/aggregation/sample/

db.collection_of_things.aggregate(
   [ { $sample: { size: 15 } } ]
)

Then add another step to count up the 0s and 1s using $group to get the count. Here is an example from the MongoDB docs.

dalanmiller
  • 3,467
  • 5
  • 31
  • 38
12

For MongoDB 3.0 and before, I use an old trick from SQL days (which I think Wikipedia use for their random page feature). I store a random number between 0 and 1 in every object I need to randomize, let's call that field "r". You then add an index on "r".

db.coll.ensureIndex(r: 1);

Now to get random x objects, you use:

var startVal = Math.random();
db.coll.find({r: {$gt: startVal}}).sort({r: 1}).limit(x);

This gives you random objects in a single find query. Depending on your needs, this may be overkill, but if you are going to be doing lots of sampling over time, this is a very efficient way without putting load on your backend.

Nic Cottrell
  • 9,401
  • 7
  • 53
  • 76
4

Here's an example in the mongo shell .. assuming a collection of collname, and a value of interest in thefield:

var total = db.collname.count();
var count = 0;
var numSamples = 1000;

for (i = 0; i < numSamples; i++) {
    var random = Math.floor(Math.random()*total);
    var doc = db.collname.find().skip(random).limit(1).next();
    if (doc.thefield) {
        count += (doc.thefield == 1);
    }
}
Stennie
  • 63,885
  • 14
  • 149
  • 175
  • This also answers one other question: that unlike SQL, MongoDB does not have a built in function for this really. Also that skip could (...could) become troublesome for larger random values, depends though. – Sammaye Oct 01 '12 at 13:59
  • Just a warning that calling skip with very large values can cause a lot of work on the server side, and can be pretty slow, right @Stennie? – Nic Cottrell Jun 18 '22 at 16:45
1

I was gonna edit my comment on @Stennies answer with this but you could also use a seprate auto incrementing ID index here as an alternative if you were to skip over HUGE amounts of record (talking huge here).

I wrote another answer to another question a lot like this one where some one was trying to find nth record of the collection:

php mongodb find nth entry in collection

The second half of my answer basically describes one potential method by which you could approach this problem. You would still need to loop 1000 times to get the random row of course.

Community
  • 1
  • 1
Sammaye
  • 43,242
  • 7
  • 104
  • 146
0

If you are using mongoengine, you can use a SequenceField to generate an incremental counter.

class User(db.DynamicDocument):
    counter = db.SequenceField(collection_name="user.counters")

Then to fetch a random list of say 100, do the following

def get_random_users(number_requested):
    users_to_fetch = random.sample(range(1, User.objects.count() + 1), min(number_requested, User.objects.count()))
    return User.objects(counter__in=users_to_fetch)

where you would call

get_random_users(100)
matthewlent
  • 549
  • 4
  • 18