17

I've recently discovered that Mongo has no SQL equivalent to "ORDER BY RAND()" in it's command syntax (https://jira.mongodb.org/browse/SERVER-533)

I've seen the recommendation at http://cookbook.mongodb.org/patterns/random-attribute/ and frankly, adding a random attribute to a document feels like a hack. This won't work because this places an implicit limit to any given query I want to randomize.

The other widely given suggestion is to choose a random index to offset from. Because of the order that my documents were inserted in, that will result in one of the string fields being alphabetized, which won't feel very random to a user of my site.

I have a couple ideas on how I could solve this via code, but I feel like I'm missing a more obvious and native solution. Does anyone have a thought or idea on how to solve this more elegantly?

Andy Baird
  • 6,088
  • 4
  • 43
  • 63
  • 2
    There is a [feature request to get random items from a collection](https://jira.mongodb.org/browse/SERVER-533) in the MongoDB ticket tracker. If implemented natively, it would likely be the most efficient option. (If you want the feature, go vote it up.) – David J. Jun 17 '12 at 02:33
  • This question has been asked in many forms here on Stack Overflow. The most popular question is [Random record from MongoDB](http://stackoverflow.com/questions/2824157/random-record-from-mongodb) -- it has good responses. That said, I think the best way of thinking about the question is not to think about getting one random document but, rather, randomizing a result set, just like you asked! See [Ordering a result set randomly in Mongo](http://stackoverflow.com/questions/8500266/ordering-a-result-set-randomly-in-mongo) for that. – David J. Jun 17 '12 at 02:43

6 Answers6

7

I have to agree: the easiest thing to do is to install a random value into your documents. There need not be a tremendously large range of values, either -- the number you choose depends on the expected result size for your queries (1,000 - 1,000,000 distinct integers ought to be enough for most cases).

When you run your query, don't worry about the random field -- instead, index it and use it to sort. Since there is no correspondence between the random number and the document, you should get fairly random results. Note that collisions will likely result in documents being returned in natural order.

While this is certainly a hack, you have a very easy escape route: given MongoDB's schema-free nature, you can simply stop including the random field once there is support for random sort in the server. If size is an issue, you could run a batch job to remove the field from existing documents. There shouldn't be a significant change in your client code if you design it carefully.

An alternative option would be to think long and hard about the number of results that will be randomized and returned for a given query. It may not be overly expensive to simply do shuffling in client code (i.e., if you only consider the most recent 10,000 posts).

brandon10gen
  • 361
  • 1
  • 2
  • 1
    Yeah, it's actually reasonable in this case to grab the entire collection and do it all on the client code so that's what I ended up doing. It just *feels* like functionality that should be native to the data storage. – Andy Baird Dec 14 '11 at 21:38
  • I don't think randomness is a very common requirement for databases and it's a bit tricky to implement efficiently in constant time. – Remon van Vliet May 09 '12 at 08:48
  • I'd add too that pagination becomes quite weird with a random ordering. – juanpaco Jun 06 '13 at 17:04
2

You can give this a try - it's fast, works with multiple documents and doesn't require populating rand field at the beginning, which will eventually populate itself:

  1. add index to .rand field on your collection
  2. use find and refresh, something like:
// Install packages:
//   npm install mongodb async
// Add index in mongo:
//   db.ensureIndex('mycollection', { rand: 1 })

var mongodb = require('mongodb')
var async = require('async')

// Find n random documents by using "rand" field.
function findAndRefreshRand (collection, n, fields, done) {
  var result = []
  var rand = Math.random()

  // Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
  var appender = function (criteria, options, done) {
    return function (done) {
      if (options.limit > 0) {
        collection.find(criteria, fields, options).toArray(
          function (err, docs) {
            if (!err && Array.isArray(docs)) {
              Array.prototype.push.apply(result, docs)
            }
            done(err)
          }
        )
      } else {
        async.nextTick(done)
      }
    }
  }

  async.series([

    // Fetch docs with unitialized .rand.
    // NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
    appender({ rand: { $exists: false } }, { limit: n - result.length }),

    // Fetch on one side of random number.
    appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),

    // Continue fetch on the other side.
    appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),

    // Refresh fetched docs, if any.
    function (done) {
      if (result.length > 0) {
        var batch = collection.initializeUnorderedBulkOp({ w: 0 })
        for (var i = 0; i < result.length; ++i) {
          batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
        }
        batch.execute(done)
      } else {
        async.nextTick(done)
      }
    }

  ], function (err) {
    done(err, result)
  })
}

// Example usage
mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
  if (!err) {
    findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
      if (!err) {
        console.log(result)
      } else {
        console.error(err)
      }
      db.close()
    })
  } else {
    console.error(err)
  }
})
Mirek Rusin
  • 18,820
  • 3
  • 43
  • 36
2

What you want cannot be done without picking either of the two solutions you mention. Picking a random offset is a horrible idea if your collection becomes larger than a few thousands documents. The reason for this is that the skip(n) operation takes O(n) time. In other words, the higher your random offset the longer the query will take.

Adding a randomized field to the document is, in my opinion, the least hacky solution there is given the current feature set of MongoDB. It provides stable query times and gives you some say over how the collection is randomized (and allows you to generate a new random value after each query through a findAndModify for example). I also do not understand how this would impose an implicit limit on your queries that make use of randomization.

Remon van Vliet
  • 18,365
  • 3
  • 52
  • 57
  • 1
    It adds an implicit limit because there may only be a certain amount of documents at any randomly generated number -- e.g., if the random number I draw is 0.9111 there's only going to be a certain number of documents that are going to qualify for the criteria $gte => 0.9111 – Andy Baird Dec 14 '11 at 14:16
  • @Andy, that's only a limit of the amount of documents that fit the criteria is less than the amount you require for your application. If you hit that edge case you simply supplement your set with a new query with a newly generated random number. – Remon van Vliet May 09 '12 at 08:49
  • @Andy, also you'd want to find a single document per random value rather than the set starting at 0.9111 if you need your set to be truely random (e.g. avoid the case where the set returned by 0.9111 is 90% the same as that returned by using 0.9222 for example) – Remon van Vliet Jun 18 '12 at 10:51
0

One could insert an id field (the $id field won't work because its not an actual number) use modulus math to get a random skip. If you have 10,000 records and you wanted 10 results you could pick a modulus between 1 and 1000 randomly sucH as 253 and then request where mod(id,253)=0 and this is reasonably fast if id is indexed. Then randomly sort client side those 10 results. Sure they are evenly spaced out instead of truly random, but it close to what is desired.

JohnDavid
  • 391
  • 3
  • 4
0

Both of the options seems like non-perfect hacks to me, random filed and will always have same value and skip will return same records for a same number.

Why don't you use some random field to sort then skip randomly, i admit it is also a hack but in my experience gives better sense of randomness.

Imran Naqvi
  • 2,202
  • 5
  • 26
  • 53
0

The other widely given suggestion is to choose a random index to offset from. Because of the order that my documents were inserted in, that will result in one of the string fields being alphabetized, which won't feel very random to a user of my site.

Why? If you have 7.000 documents and you choose three random offsets from 0 to 6999, the chosen documents will be random, even if the collection itself is sorted alphabetically.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 1
    Your solution actually works, but requires a lot of sub-queries to count the full collection size and then to manually pluck the random offsets. This is definitely troublesome if I want to pull a large amount of records in random order. – Andy Baird Dec 14 '11 at 07:08