32

So far, the "common" way to get a random record from the Database has been:

# Postgress
Model.order("RANDOM()").first 

# MySQL
Model.order("RAND()").first

But, when doing this in Rails 5.2, it shows the following Deprecation Warning:

DEPRECATION WARNING: Dangerous query method (method whose arguments are used as raw SQL) called with non-attribute argument(s): "RANDOM()". Non-attribute arguments will be disallowed in Rails 6.0. This method should not be called with user-provided values, such as request parameters or model attributes. Known-safe values can be passed by wrapping them in Arel.sql().

I am not really familiar with Arel, so I am not sure what would be the correct way to fix this.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Daniel
  • 4,051
  • 2
  • 28
  • 46

3 Answers3

52

If you want to continue using order by random() then just declare it safe by wrapping it in Arel.sql like the deprecation warning suggests:

Model.order(Arel.sql('random()')).first # PostgreSQL
Model.order(Arel.sql('rand()')).first   # MySQL

There are lots of ways of selecting a random row and they all have advantages and disadvantages but there are times when you absolutely must use a snippet of SQL in an order by (such as when you need the order to match a Ruby array and have to get a big case when ... end expression down to the database) so using Arel.sql to get around this "attributes only" restriction is a tool we all need to know about.

Edited: The sample code is missing a closing parentheses.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • how is it more safe to declare it with `Arel.sql`? `.order('RAND())'` seems totally fine to me - could you elaborate? – davegson Nov 01 '18 at 11:30
  • 1
    @davegson If you don't wrap it in an `Arel.sql` call you get a deprecation warning (that will turn into an exception in the next version) so you don't really have much choice. Just wrapping something in `Arel.sql` won't make anything safer, it just makes you work a little harder to shoot your foot off. – mu is too short Nov 01 '18 at 14:52
  • 3
    I love how much less code this is. Now I have to tell AR that I'm using sql. – baash05 Nov 27 '18 at 03:54
  • 1
    @baash05 You could use even less code and drastically improve the readability by doing it all with AREL ;) – mu is too short Nov 27 '18 at 06:59
  • 1
    That made me laugh – baash05 Nov 29 '18 at 22:07
  • Model.order(Arel.sql('RAND()')).first worked for me. random gave an error. unknown mysql function – Shuaib Zahda Oct 31 '20 at 07:30
  • @ShuaibZahda Right, `random()` is for PostgreSQL, `rand()` is for MySQL. – mu is too short Oct 31 '20 at 16:52
5

I'm a fan of this solution:

Model.offset(rand(Model.count)).first
Anthony L
  • 2,159
  • 13
  • 25
  • 5
    Makes sense. The only problems I see to this solution are that it hits the DB twice, instead of once, and that it could theoretically fail due to race conditions. – Daniel Feb 21 '18 at 02:19
  • 6
    @Daniel But to be fair, `order by random()` can be very expensive if it is working with a large result set. – mu is too short Feb 21 '18 at 02:21
  • 1
    Yes. That's right! I guess that it depends on the size of the table and, also, the latency to the DB server. It's good to know that both options exist. – Daniel Feb 21 '18 at 04:33
  • 3
    Note that Model.count can take a *long* time for large postgres tables. – kwerle Apr 13 '18 at 19:05
  • 1
    @muistooshort I have found RAND() is _much_ faster: https://stackoverflow.com/a/47178248/1651458 – Sam Apr 29 '19 at 13:05
3

With many records, and not many deleted records, this may be more efficient. In my case I have to use .unscoped because default scope uses a join. If your model doesn't use such a default scope, you can omit the .unscoped wherever it appears.

Patient.unscoped.count #=> 134049

class Patient
  def self.random
    return nil unless Patient.unscoped.any?
    until @patient do
      @patient = Patient.unscoped.find rand(Patient.unscoped.last.id)
    end
    @patient
  end
end

#Compare with other solutions offered here in my use case

puts Benchmark.measure{10.times{Patient.unscoped.order(Arel.sql('RANDOM()')).first }}
#=>0.010000   0.000000   0.010000 (  1.222340)
Patient.unscoped.order(Arel.sql('RANDOM()')).first
Patient Load (121.1ms)  SELECT  "patients".* FROM "patients"  ORDER BY RANDOM() LIMIT 1

puts Benchmark.measure {10.times {Patient.unscoped.offset(rand(Patient.unscoped.count)).first }}
#=>0.020000   0.000000   0.020000 (  0.318977)
Patient.unscoped.offset(rand(Patient.unscoped.count)).first
(11.7ms)  SELECT COUNT(*) FROM "patients"
Patient Load (33.4ms)  SELECT  "patients".* FROM "patients"  ORDER BY "patients"."id" ASC LIMIT 1 OFFSET 106284

puts Benchmark.measure{10.times{Patient.random}}
#=>0.010000   0.000000   0.010000 (  0.148306)

Patient.random
(14.8ms)  SELECT COUNT(*) FROM "patients"
#also
Patient.unscoped.find rand(Patient.unscoped.last.id)
Patient Load (0.3ms)  SELECT  "patients".* FROM "patients"  ORDER BY "patients"."id" DESC LIMIT 1
Patient Load (0.4ms)  SELECT  "patients".* FROM "patients" WHERE "patients"."id" = $1 LIMIT 1  [["id", 4511]]

The reason for this is because we're using rand() to get a random ID and just do a find on that single record. However the greater the number of deleted rows (skipped ids) the more likely the while loop will execute multiple times. It might be overkill but could be worth a the 62% increase in performance and even higher if you never delete rows. Test if it's better for your use case.

lacostenycoder
  • 10,623
  • 4
  • 31
  • 48