1

I need to select 2 random rows but it's known that rand() is too slow. So I tryed a code from a website and it is:

SELECT *
  FROM bilder AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM bilder)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 2

But this way I get same 2 rows multiple times and parsing is also not correct, so this is complete useless. Is there a working solution which is better that rand()? The table name is bilder the fields are: id, userid, nickname. id is primary and auto increment. Some rows are also deleted so it's not 1 2 3 4 5 but 1 2 4 5 6... so the solution to generate random numbers and select them won't work

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 2
    Define "too slow". And getting the same two rows multiple time *is* random. – Jay Blanchard Oct 14 '14 at 21:25
  • 1
    possible duplicate of [MySQL select 10 random rows from 600K rows fast](http://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast) – TML Oct 14 '14 at 21:28
  • http://jan.kneschke.de/projects/mysql/order-by-rand/ look at the benchmark. By two rows multiple times I mean: first query result: id 2 and 4, second query: id 1 and 3, third query: 2 and 4 AGAIN, fourth: 7 and 9, sixt: 2 and 4 AGAIN –  Oct 14 '14 at 21:31
  • TML I seen that too but its not working right with two random rows :/ –  Oct 14 '14 at 21:33

3 Answers3

2

There are multiple solutions to this problem, but something like the following often has good enough performance:

SELECT b.*
FROM bilder b CROSS JOIN
     (SELECT COUNT(*) as cnt FROM bilder) v
WHERE rand() <= 100 / cnt
ORDER BY rand()
LIMIT 2;

The subquery selects about 100 rows. Sorting such a small number of rows is usually pretty fast. It then chooses two of them.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Ok so when I have 1000 rows it will choose for example 2 rows from 50 - 150 in first query and 2 rows from 300 - 400 in second query and so on? –  Oct 14 '14 at 21:47
  • @user2966167 . . . The `where` clause will choose roughly 100 random rows in the data (this is controllable, but 100 should more more than sufficient for your purposes). The outer `order by` then chooses from thi set. So, although there is a small probably they will be neighboring ids, this is not generally going to happen. – Gordon Linoff Oct 14 '14 at 21:51
0

The most likely cause of your consternation was failing to wrap the RAND() * (SELECT MAX(id) FROM bilder) in a call to CEIL(), resulting in a float instead of an integer:

 SELECT *
  FROM bilder AS r1 JOIN
       (SELECT ceil(RAND() *
                     (SELECT MAX(id)
                        FROM bilder)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 2
TML
  • 12,813
  • 3
  • 38
  • 45
  • Here are the pairs of IDs this query provided me across ~20 runs, using a sample data set where id had an average gap of 3: `((79, 82), (133, 136), (124, 127), (223, 226), (139, 142), (28, 31), (19, 22), (7, 10), (280, 283), (178, 181), (52, 55), (19, 22), (244, 247),(190, 193), (94, 97), (187, 190), (58, 61), (22, 25), (235, 238), (220, 223))` – TML Oct 14 '14 at 21:53
  • I tryed it but there is a problem with parsing, when I echo the id's I get 1,1 4,4 ,3,3 and so on :/ –  Oct 14 '14 at 21:58
0

There are much faster methods of choosing one random row. Both of these methods below choose only one random row. You asked for two random rows. But these methods are orders of magnitude faster than doing a table-scan, so it's worth using these methods even if it takes multiple tries to get a second distinct random row.

The fastest way is to do it in two queries (I'll show in pseudocode):

$max = SELECT MAX(id) FROM bilder
$rand1 = rand(1..$max)-1
SELECT * FROM bilder WHERE id > $rand1 LIMIT 1
$id1 = id of the first row chosen
$rand2 = rand(1..$max)-1
SELECT * FROM bilder WHERE id > $rand2 AND id <> $id1 LIMIT 1
$id2 = id of the second row chosen
if $id2 = $id1, then choose a new $rand2 and query again

The problem with this is that if there are large gaps due to deleted rows, you get a higher chance of choosing the row that follows the gap.

Another fast method if you don't update the table very often is to add a column for consecutive ordering, then assign sequential values to that column in random order:

ALTER TABLE bilder ADD COLUMN rank INT UNSIGNED, ADD KEY (rank);
SET @r := 0;
UPDATE bilder SET rank = (@r:=@r+1) ORDER BY RAND();

Do this ranking once. It will be slow. Then once the rows are ranked, you can pick random value(s) fast:

$max = SELECT MAX(rank) FROM bilder;
$rand1 = rand(1..$max)
$rand2 = rand(1..$max) until $rand2 != $rand1
SELECT * FROM bilder WHERE rank IN ($rand1, $rand2);

Of course if you add or delete any rows from the table, you have to renumber the rows. Or at least you can do this more efficiently:

  • If you insert, then insert the new row with a random value and update the rank of the existing row to $max+1.
  • If you delete, note the rank of the deleted row and update the row with rank of $max to the rank you just deleted.
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • The fastest way is what I did and there was the problem with deleted rows. So the solution with ALTER TABLE... would make id 1 2 3 5 6 to 1 2 3 4 5? –  Oct 14 '14 at 21:45
  • No, you want to leave the existing primary keys alone. I was suggesting adding a new column, with consecutive values. So the rows with id's 1 2 3 5 6 would have random ranks, e.g. 2 5 4 3 1. – Bill Karwin Oct 14 '14 at 22:03