0

My database has 300 rows at the moment and propably will increse to about 5000 rows during the next years. I want to know how I can select 20 rows randomly best.

I found here MySQL select 10 random rows from 600K rows fast (where it is again refered to http://jan.kneschke.de/projects/mysql/order-by-rand/) that the following code produces a random selection very fast:

SELECT name
  FROM random AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

so in php I tried the following to get 20 rows:

$anfrage    =   "SELECT name
  FROM random AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 20";

 $ergebnis=$db->query($anfrage)
        or die($db->error);
 while($zeile=mysqli_fetch_assoc($ergebnis))print_r($zeile);

But when I run the script I wont get 20 rows most of the time. Actually, the probability to pick 20 different rows out of 300 is about 48,8%.

Can I change the above code to get really 20 rows very quick?

Community
  • 1
  • 1
Adam
  • 25,960
  • 22
  • 158
  • 247
  • your problem is, that r2.id could be 299 and so you would have only two rows WHERE "r1.id >= r2.id" is true – nidomiro Feb 20 '14 at 01:28
  • @niccomatik I guess I have to set Limit to '1' and execute the whole query in a loop. Then after each query I have to check if the gained row is a new one - is that how it works? – Adam Feb 20 '14 at 01:31
  • 2
    even at 50 000 rows a simple random select will be fast. 5000 rows is nothing. MySQL probably wont even use indexes on queries of tables that size much easier to just scan a table so I am not sure why you are worried about speed. – Namphibian Feb 20 '14 at 02:03
  • Funny question, have you tried to apply `RAND() * (SELECT MAX(id) - 20 ...`? :) – Ja͢ck Feb 20 '14 at 06:58

3 Answers3

0

If you read the article you mention in your question, you would find out that there are 3 solutions:

  • execute the Query several times
  • write a stored procedure which is executing the query and stores the result in a temp-table
  • make a UNION

All of them are explained in the article.

Michal Brašna
  • 2,293
  • 13
  • 17
0

The "slow" way of getting 20 random names is this:

SELECT name
FROM random 
ORDER BY rand()
LIMIT 20;

On 300 rows, this might have similar performance to the method that you are using. Have you tried it? I'm not sure about 5,000 rows, but it is worth trying there as well.

Your method is essentially this (the query is a bit simplified):

SELECT name
FROM random r1 CROSS JOIN
     (SELECT RAND() * MAX(id) as id FROM random) r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 20;

You are assuming that r2 is evaluated for each iteration with a different value. That assumption may not be true.

Another approach is to do this:

SELECT name
FROM random r1 CROSS JOIN
     (SELECT count(*) as cnt FROM random) const
WHERE rand() <= 20.0 / cnt;

Unfortunately, this gives an approximate number of rows. About 20 each time. Maybe you really want 20. In that case, do something like doubling the expected number of rows and then using order by/limit:

SELECT name
FROM random r1 CROSS JOIN
     (SELECT count(*) as cnt FROM random) const
WHERE rand() <= 2*20.0 / cnt
ORDER BY rand()
LIMIT 20;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

You could create a shuffled table that you update occasionally:

INSERT INTO random_ids 
SELECT id 
FROM table_name
ORDER BY RAND();

Record the number of random values that were inserted in your application; then use the following:

SELECT * FROM table_name
INNER JOIN (SELECT id 
    FROM random_ids
    LIMIT ?,20
) r1 ON r1.id = table_name.id;

Whereby the limit is determined by your application to be within the range of [0, <count>)

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309