4

I need to select random records from db. In Sqlite3, which I use on development, there is a function called Random(). However, in Postgresql it's called Rand(). I don't remember about MySql, but probably it's called so there.

So if I have a code of (for Sqlite3)

data = Items.where(pubshied: is_pubshied).order("RANDOM()").limit(count)

how do I ensure that it will work with different databases?

Alan Coromano
  • 24,958
  • 53
  • 135
  • 205
  • probably related http://stackoverflow.com/questions/5342270/rails-3-get-random-record – waldyr.ar Dec 20 '12 at 12:43
  • I have added my answer below but should probably say your question is wrong in that sqlite3 and postgresql both use `random()` only mysql uses `rand()` – Lee Jarvis Dec 20 '12 at 12:48
  • 1
    I'm thinking more about a Railsy solution to your problem, but note that you've just run into the reason that you should use the same DB system for development and production. Install Postgres locally for development; it's worth it. – Marnen Laibow-Koser Dec 20 '12 at 15:52
  • @AlanDert What doesn't work for you? Local Postgres? If not, then take the time to get it working: the installation can be tricky, but you really want to be running the same DB in both development and production. – Marnen Laibow-Koser Dec 25 '12 at 16:59
  • I should mention, however, that making sure your code is non-DB-specific is still a *very* good practice – Marnen Laibow-Koser Dec 25 '12 at 17:05

3 Answers3

4

Rails doesn't support this out of the box. I believe I achieved this with a model extension (I dont use it anymore because I force the use of Postgresql), but something like this could work:

module Randomize
  extend ActiveSupport::Concern

  included do
    scope :random, -> { order(rand_cmd) }
  end

  module ClassMethods
    def rand_cmd
      if connection.adapter_name =~ /mysql/i
        'rand()'
      else
        'random()'
      end
    end
  end
end

You can then do

class Item
  include Randomize
end

Item.where(...).random.limit(...)
Lee Jarvis
  • 16,031
  • 4
  • 38
  • 40
  • The performance implications are kind of a big deal. I can't imagine someone choosing to do anything like this in production. – pguardiario Dec 20 '12 at 12:51
  • @pguardiario I dont see why anyone would need to use this in production anyway, that said.. performance implications of what? A method call? – Lee Jarvis Dec 20 '12 at 12:55
  • I mean the performance implications of something like `order by rand()`, which unless I'm mistaken, will require a full disk read every time. – pguardiario Dec 20 '12 at 13:36
  • @pguardiario Yes you're right, `order by random()` is not performant but that's beyond the scope of this discussion. That said, for reference: http://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/ – Lee Jarvis Dec 20 '12 at 13:47
0

For a performant, non-adapter-specific way to order randomly, populate a random column, put an index on it and call it something like:

Foo.order("random_column > #{rand}").limit(1)
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • Downvoting because this will give you the same ordering each time. Also, I *think* that once you do the subtraction, the index won't be used, so this doesn't help performance. – Marnen Laibow-Koser Dec 20 '12 at 15:50
  • Interesting, but I still don't think this is a very good solution: your `ORDER BY` clause will evaluate to `TRUE` for some of the records, and `FALSE` for the others. You'll always have the lower values of `random_column` coming closer to the beginning than the higher; only the cutoff point between those divisions will be random. Within each division, the ordering will be unpredictable. This is barely better than no `ORDER BY` at all, and may even be worse, because it forces low values of `random_column` closer to the front, so the randomness will be quite biased. – Marnen Laibow-Koser Dec 21 '12 at 08:17
  • Feel free to fix it. My brain isn't working right now but this is along the right track to the proper solution. – pguardiario Dec 21 '12 at 08:35
  • I can't think of a good way to get the proper solution with this line of thinking. If you think this is the right track to the proper solution, I'd be interested to see you (or anyone else) take it the rest of the way there. I'll keep thinking, though. – Marnen Laibow-Koser Dec 21 '12 at 08:53
  • Hmm. `WHERE random_column = rand()`? Of course, that only works if the random values can be guaranteed consecutive, which they generally can't be. So that won't work. – Marnen Laibow-Koser Dec 21 '12 at 16:02
  • Years later: `ORDER BY abs(random_column - rand())`? – Marnen Laibow-Koser Mar 26 '19 at 15:44
-1

From the comments from the post that waldyr.ar mentions in his comment: https://stackoverflow.com/a/12038506/16784.

Tl;dr: you can use Items.all.sample(count). Of course that retrieves the entire table and may not be useful for large tables.

Community
  • 1
  • 1
Confusion
  • 16,256
  • 8
  • 46
  • 71
  • 1
    That's a bad idea to get all items. – Alan Coromano Dec 20 '12 at 13:29
  • Ordering by `RAND()` is also awful in general. Whether this is useful depends on the intended use: for small tables or, more likely, small selections from tables, the `sample` method is very useful. If performance is of the essence, more elaborate strategies are needed. This is a pragmatic solution; I'll happily answer the question on how to increase performance, if it ever comes. – Confusion Dec 20 '12 at 16:43