11

Lets say I have a Collection of users. Is there a way of using mongoid to find n random users in the collection where it does not return the same user twice? For now lets say the user collection looks like this:

class User
  include Mongoid::Document
  field :name
end

Simple huh?

Thanks

GTDev
  • 5,488
  • 9
  • 49
  • 84
  • 1
    This is being considered by the MongoDB team. They prioritize issues based on demand; so if you want this feature, check out [Ticket #533: Get random item(s) from Collection](https://jira.mongodb.org/browse/SERVER-533), read up, and vote accordingly. – David J. Jun 15 '12 at 16:47
  • The ticket has been closed and there is now a `$sample` operator for MongoDB. Doesn't seem to be integrated to Mongoid yet, the query has to be done manually. You might also want to have a look at `snapshot` to really avoid duplicates from concurrency. – Cyril Duchon-Doris Apr 17 '16 at 15:30

9 Answers9

19

If you just want one document, and don't want to define a new criteria method, you could just do this:

random_model = Model.skip(rand(Model.count)).first

If you want to find a random model based on some criteria:

criteria = Model.scoped_whatever.where(conditions) # query example
random_model = criteria.skip(rand(criteria.count)).first
tothemario
  • 5,851
  • 3
  • 44
  • 39
14

The best solution is going to depend on the expected size of the collection.

For tiny collections, just get all of them and .shuffle.slice!

For small sizes of n, you can get away with something like this:

result = (0..User.count-1).sort_by{rand}.slice(0, n).collect! do |i| User.skip(i).first end

For large sizes of n, I would recommend creating a "random" column to sort by. See here for details: http://cookbook.mongodb.org/patterns/random-attribute/ https://github.com/mongodb/cookbook/blob/master/content/patterns/random-attribute.txt

Dan Healy
  • 747
  • 7
  • 13
  • 1
    Thanks... this may be overkill but was wondering if there was a simple way of converting that back into a Mongoid::Criteria – GTDev Oct 14 '11 at 02:47
  • SQL has ORDER BY RAND() but as far as I know there is no equivalent of that in mongodb. So you can create the "Random" column and then User.order_by that, which would be a single query. – Dan Healy Oct 14 '11 at 02:54
  • 1
    According to [SO: MongoDB find random dataset performance](http://stackoverflow.com/questions/9434969/mongodb-find-random-dataset-performance), `skip` isn't very efficient for large values: "Skip forces Mongo to walk through the result set until it gets to the document you're looking for, so the bigger the result set of that query, the longer it's going to take." (This supports Dan's answer.) – David J. Jun 15 '12 at 16:51
  • 1
    @DanHealy the link was changed . I only found this one: https://github.com/mongodb/cookbook/blob/master/content/patterns/random-attribute.txt Is this equivalent? – Fernando Kosh Apr 23 '15 at 16:15
  • 1
    @FernandoKosh yes, that looks very similar. Thanks for the update! – Dan Healy Apr 23 '15 at 20:26
8

MongoDB 3.2 comes to the rescue with $sample (link to doc)

EDIT : The most recent of Mongoid has implemented $sample, so you can call YourCollection.all.sample(5)

Previous versions of mongoid

Mongoid doesn't support sample until Mongoid 6, so you have to run this aggregate query with the Mongo driver :

samples = User.collection.aggregate([ { '$sample': { size: 3 } } ])
# call samples.to_a if you want to get the objects in memory

What you can do with that

I believe the functionnality should make its way soon to Mongoid, but in the meantime

module Utility
  module_function
  def sample(model, count)
    ids = model.collection.aggregate([ 
      { '$sample': { size: count } }, # Sample from the collection
      { '$project': { _id: 1} }       # Keep only ID fields
    ]).to_a.map(&:values).flatten     # Some Ruby magic

    model.find(ids)
  end
end

Utility.sample(User, 50)
Cyril Duchon-Doris
  • 12,964
  • 9
  • 77
  • 164
  • 1
    WARNING : it seems that on some Mongoid versions, calling `.all.sample` resolves to an array before calling sample (instead of performing a MongoDB `$sample`), so `User.all.sample` may load your whole DB in memory before `sample`ing the array. You better check your mongo log when you implement this to make sure nothing too bad happens. – Cyril Duchon-Doris Jun 14 '19 at 13:45
3

If you really want simplicity you could use this instead:

class Mongoid::Criteria

  def random(n = 1)
    indexes = (0..self.count-1).sort_by{rand}.slice(0,n).collect!

    if n == 1
      return self.skip(indexes.first).first
    else
      return indexes.map{ |index| self.skip(index).first }
    end
  end

end

module Mongoid
  module Finders

    def random(n = 1)
      criteria.random(n)
    end

  end
end

You just have to call User.random(5) and you'll get 5 random users. It'll also work with filtering, so if you want only registered users you can do User.where(:registered => true).random(5).

This will take a while for large collections so I recommend using an alternate method where you would take a random division of the count (e.g.: 25 000 to 30 000) and randomize that range.

Moox
  • 1,122
  • 9
  • 23
  • On what place and with what name save this file?. How call you to this file? Thank you! – hyperrjas Jan 23 '13 at 22:21
  • @hyperrjas You can put this file the lib folder of your application. Then make sure your application is configured to autoload the files inside that folder. The name of the file doesn't matter. – Moox Jan 23 '13 at 23:02
  • Thank you. I have added inside `/app/lib` folder the `random.rb` file with this code, but for example, if I run in console `User.random(5)` I get the error `NoMethodError: undefined method `random' for User:Class`. How can I fix this? – hyperrjas Jan 24 '13 at 17:14
  • Your lib folder should be under the application root, not sure if that's what you mean by app, since rails also has an app folder in the application root folder. If you mean APP_ROOT/lib, make sure you have autoload set up in your config file. http://stackoverflow.com/questions/3356742/best-way-to-load-module-class-from-lib-folder-in-rails-3 – Moox Jan 26 '13 at 19:04
2

You can do this by

  1. generate random offset which will further satisfy to pick the next n elements (without exceeding the limit)
  2. Assume count is 10, and the n is 5
  3. to do this check the given n is less than the total count
  4. if no set the offset to 0, and go to step 8
  5. if yes, subtract the n from the total count, and you will get a number 5
  6. Use this to find a random number, the number definitely will be from 0 to 5 (Assume 2)
  7. Use the random number 2 as offset
  8. now you can take the random 5 users by simply passing this offset and the n (5) as a limit.
  9. now you get users from 3 to 7

code

>> cnt = User.count
=> 10
>> n = 5
=> 5
>> offset = 0
=> 0
>> if n<cnt
>>    offset = rand(cnt-n)
>>  end
>> 2
>> User.skip(offset).limit(n)

and you can put this in a method

def get_random_users(n)
  offset = 0
  cnt = User.count
  if n < cnt
    offset = rand(cnt-n)
  end
  User.skip(offset).limit(n)
end

and call it like

rand_users = get_random_users(5)

hope this helps

RameshVel
  • 64,778
  • 30
  • 169
  • 213
  • thanks. But will this really be random. I guess this will provide a random range from cnt to cnt+n but wont this create a conditional. such as if user 5 is selected... there is a high chance that user 6 will be while a zero chance that user 11 will be? – GTDev Oct 14 '11 at 15:44
  • Right, this is a trade off from my answer. If you can get away with starting at a random spot and just selecting the next n sequential records, then you can perform it in one query rather than n queries. You could then shuffle the result to have it be randomized within that selection. But no, this is not really random. – Dan Healy Oct 14 '11 at 18:38
0

Since I want to keep a criteria, I do:

scope :random, ->{
  random_field_for_ordering = fields.keys.sample
  random_direction_to_order = %w(asc desc).sample
  order_by([[random_field_for_ordering, random_direction_to_order]])
}
apneadiving
  • 114,565
  • 26
  • 219
  • 213
0

Just encountered such a problem. Tried

Model.all.sample

and it works for me

  • 10
    Pretty sure this will load every single model from the database and then use the `Array#sample` method to choose a random item. I guess OK if you're just poking around in console, but not recommended for production applications. – steve Aug 22 '13 at 23:47
  • Takes lot of time if the number of items are more in the model – Sairam Oct 11 '13 at 07:16
  • It works but with more then 20'000 documents as in my case it takes to long, as mentioned. – Markus Graf Nov 18 '15 at 15:14
0

The approach from @moox is really interesting but I doubt that monkeypatching the whole Mongoid is a good idea here. So my approach is just to write a concern Randomizable that can included in each model you use this feature. This goes to app/models/concerns/randomizeable.rb:

module Randomizable
  extend ActiveSupport::Concern

  module ClassMethods
    def random(n = 1)
      indexes = (0..count - 1).sort_by { rand }.slice(0, n).collect!

      return skip(indexes.first).first if n == 1
      indexes.map { |index| skip(index).first }
    end
  end
end

Then your User model would look like this:

class User
  include Mongoid::Document
  include Randomizable

  field :name
end

And the tests....

require 'spec_helper'

class RandomizableCollection
  include Mongoid::Document
  include Randomizable

  field :name
end

describe RandomizableCollection do
  before do
    RandomizableCollection.create name: 'Hans Bratwurst'
    RandomizableCollection.create name: 'Werner Salami'
    RandomizableCollection.create name: 'Susi Wienerli'
  end

  it 'returns a random document' do
    srand(2)

    expect(RandomizableCollection.random(1).name).to eq 'Werner Salami'
  end

  it 'returns an array of random documents' do
    srand(1)

    expect(RandomizableCollection.random(2).map &:name).to eq ['Susi Wienerli', 'Hans Bratwurst']
  end
end
Markus Graf
  • 533
  • 3
  • 16
-2

I think it is better to focus on randomizing the returned result set so I tried:

Model.all.to_a.shuffle

Hope this helps.