0

I see a lot of questions in the mySQL tag about how best to select N random items from a very large table. Most of the time the answer seems to boil down to ORDER BY RAND() LIMIT N, but I believe that this ends up being very inefficient since it has to generate a random number for each row, then re-sort the result set based on this un-indexed field.

My thought is to do something like the following: [written in PHP-esque psuedocode, but should translate to any language]

$rowcount = query("
    SELECT COUNT(*) FROM mytable WHERE [condition];
");
$rand_index = rand(0, $rowcount); // random int between 0 and $rowcount
$rand_row = query("
    SELECT field1, field2, ...
    FROM mytable WHERE [condition]
    LIMIT $rand_index, 1
");
// repeat last 2 lines as needed to get N rows.

Shouldn't these queries be much faster than an ORDER BY RAND() LIMIT N since they will be using the indexes defined in the tables used?

Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • 1
    Possibly related: http://stackoverflow.com/questions/1244555/how-can-i-optimize-mysqls-order-by-rand-function – beny23 Jan 09 '13 at 20:32
  • Possibly related: http://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast – beny23 Jan 09 '13 at 20:33
  • Yes. It looks like a good idea. – Bohemian Jan 09 '13 at 20:34
  • The problem with your solution is that each query would be a new I/O hit to the DBMS, meaning if I need 30 random rows, it's 31 hits to the DB. I would look at @beny23's related posts. – Jordan Kasper Jan 10 '13 at 20:00

0 Answers0