5
mysql> EXPLAIN SELECT * FROM urls ORDER BY RAND() LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
|  1 | SIMPLE      | urls  | ALL  | NULL          | NULL | NULL    | NULL | 62228 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+

The above doesn't qualify as efficient,how should I do it properly?

UPDATE

Seems using the solution mentioned in the answer still doesn't help:

mysql> explain SELECT  *
    -> FROM    (
    ->         SELECT  @cnt := COUNT(*) + 1,
    ->                 @lim := 10
    ->         FROM    urls
    ->         ) vars
    -> STRAIGHT_JOIN
    ->         (
    ->         SELECT  r.*,
    ->                 @lim := @lim - 1
    ->         FROM    urls r
    ->         WHERE   (@cnt := @cnt - 1)
    ->                 AND RAND(20090301) < @lim / @cnt
    ->         ) i;
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
| id | select_type | table      | type   | possible_keys | key  | key_len | ref  | rows  | Extra                        |
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL | NULL    | NULL |     1 |                              |
|  1 | PRIMARY     | <derived3> | ALL    | NULL          | NULL | NULL    | NULL |    10 |                              |
|  3 | DERIVED     | r          | ALL    | NULL          | NULL | NULL    | NULL | 62228 | Using where                  |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL | NULL    | NULL |  NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
user198729
  • 61,774
  • 108
  • 250
  • 348

2 Answers2

4

Quassnoi has written a post about selecting rows at random without performing a sort. His example selects 10 rows at random, but you can adapt it to select just one row.

If you want it to be really fast then you can use an approximation that won't be completely uniform or will sometimes fail to return a row.

You can also use a stored procedure to select a random row quickly from Bill Karwin's post:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

Note that this will run much faster in MyISAM than InnoDB because COUNT(*) is expensive in InnoDB but nearly instant in MyISAM.

Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • @user198729: Looking at your code above you forgot to change `lim` from 10 to 1. Maybe I didn't make that clear enough. – Mark Byers Apr 25 '10 at 09:27
  • It doesn't matter,still the same after changing 10 to 1.Pay attention to the 3rd line of explain output. – user198729 Apr 25 '10 at 09:28
  • @user198729: How many rows do you have? What is your storage engine (InnoDB or MyISAM)? What is the running time of the query in seconds? – Mark Byers Apr 25 '10 at 09:32
  • It's `MyISAM`,with `62229` rows. – user198729 Apr 25 '10 at 09:33
  • @user198729: What's the running time of the query in seconds? – Mark Byers Apr 25 '10 at 09:34
  • It's 0.17 secs,with the original `order by rand()` 0.13 secs – user198729 Apr 25 '10 at 09:35
  • @user198729: 0.17s is pretty fast given that it uses a table scan. Note that the first method is O(n log(n)) and the second is O(n). The lack of difference in the results is probably because you don't have any data in your database so both are very fast. I think you should notice a significant difference if you put some test data into your database. At 1 million rows you should notice a 10x speed up. How many rows do you plan to have? – Mark Byers Apr 25 '10 at 09:41
  • So O(n) is the best possible performance already? I was expecting some O(log(n)) ones. As for how many rows do I plan to have,there's no limit,it grows with the time. – user198729 Apr 25 '10 at 09:45
  • @user198729: If your PK is almost sequential with few deletions you can choose a random r between 1 and max(id) and find the first row that has `id >= r`. This will be close to instant but introduces a bias. – Mark Byers Apr 25 '10 at 09:49
  • @user198729: Or since you are using MyISAM you can count the number of rows, choose a random number `r` between `0` and `COUNT(*) - 1` and use a `LIMIT r, 1`. – Mark Byers Apr 25 '10 at 09:53
  • Oh I can't assume the sequential.Seems I have to go with O(n) anyway.But I don't understand what `RAND(20090301)` in @Quassnoi's solution means? – user198729 Apr 25 '10 at 09:53
  • @user198729: That number is a seed to make the example reproducible. Don't include the seed in your real system. – Mark Byers Apr 25 '10 at 09:55
  • Oops,still an issue: `RAND(20090301) < @lim / @cnt`,this condition might fail all rows,right? – user198729 Apr 25 '10 at 10:08
  • @user198729: No, it will succeed for exactly @lim rows. The value of @lim / @cnt changes for each row (because the values of @lim and @cnt change). – Mark Byers Apr 25 '10 at 10:22
  • I think `RAND(20090301) < @lim / @cnt` should been changed to `RAND(20090301) <= @lim / @cnt`(`<` -> `<=`) to ensure it always returns exactly @lim rows. Though practically it may never happen. What do you think? – user198729 Apr 25 '10 at 10:33
  • @user198729: See the documentation for RAND: http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand `Returns a random floating-point value v in the range 0 <= v < 1.0.` Note that it is `< 1.0` not `<= 1.0`. Exactly 1.0 can never be returned by `RAND()`. – Mark Byers Apr 25 '10 at 10:38
  • Oh then it's correct:) In fact I found this solution more efficient and without bias,but only apply for random **1** record: http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql/213242#213242 – user198729 Apr 25 '10 at 10:40
  • Simply put, the `order by` is not necessary to select a random row. – user198729 Apr 25 '10 at 10:56
  • @user198729: OK, that was actually one of my suggestions above: `since you are using MyISAM you can count the number of rows, choose a random number r between 0 and COUNT(*) - 1 and use a LIMIT r, 1`. I've put it into the main post now so that people searching will find it more quickly. – Mark Byers Apr 25 '10 at 11:01
0

Well, If you can move some logic to application layer (and I didn't misunderstood your question), then all you need is to generate random ID in your application and then perform simple select for one record identified by that key. All you need to know is count of records. Oh, and if that key was deleted, get next one.

michajas
  • 347
  • 1
  • 5
  • 16