I see a lot of questions in the mySQL tag about how best to select N random items from a very large table. Most of the time the answer seems to boil down to ORDER BY RAND() LIMIT N
, but I believe that this ends up being very inefficient since it has to generate a random number for each row, then re-sort the result set based on this un-indexed field.
My thought is to do something like the following: [written in PHP-esque psuedocode, but should translate to any language]
$rowcount = query("
SELECT COUNT(*) FROM mytable WHERE [condition];
");
$rand_index = rand(0, $rowcount); // random int between 0 and $rowcount
$rand_row = query("
SELECT field1, field2, ...
FROM mytable WHERE [condition]
LIMIT $rand_index, 1
");
// repeat last 2 lines as needed to get N rows.
Shouldn't these queries be much faster than an ORDER BY RAND() LIMIT N
since they will be using the indexes defined in the tables used?