3

I am trying to get a random row in MySQL. Here is my approach:

SELECT * 
FROM users 
WHERE id = 
 (SELECT floor((max(id) - min(id) + 1) * rand()) + min(id)
  FROM users);

This can return an empty set when there are gaps in the ids. I am fine with that. However,

  1. I don't have gaps and I still get an empty set rather frequently.
  2. More disturbing: I will get two or more results every once in a while.

What is the reason for this strange behavior? How, do I get it right?

Note:

  • the table is quite big (10^6 ... 10^7); I can't use any of the obvious, and known, solutions based on random ordering or even based on using count().
  • I'am stuck with MySQL, I don't care how it's done on the client side ruby/PHP/whatever.
  • I tried to cast the floating types, but it did not help.
  • The corresponding PostgreSQL variant (random instead of rand, and some type casts) works fine.
DrTom
  • 103
  • 7
  • 1
    For how to get (fast) one random row from your table, see Karwin's answer: http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql/213242#213242 – ypercubeᵀᴹ Nov 11 '11 at 15:15
  • You can also read this post by @Quassnoi: http://explainextended.com/2009/03/01/selecting-random-rows/ – ypercubeᵀᴹ Nov 11 '11 at 15:16
  • Can you reproduce these results at the mysql prompt? What are min(id), max(id) and the id chosen in those cases? Can you cut down your data set to a smaller size and reproduce the behavior? – pilcrow Nov 11 '11 at 15:25

5 Answers5

2

I suspect that this is because RAND() is being evaluated for every row in the USERS table in the main query. I suggest:

SELECT u1.* 
from users u1,
     (SELECT floor((max(id) - min(id) + 1) * rand())  + min(id) lid from users) u2
WHERE u1.id > u2.lid
LIMIT 1
1

Try this one -

SELECT t.* FROM
  users t,
  (SELECT @id := (FLOOR((MAX(id) - MIN(id) + 1) * RAND()) + MIN(id)) FROM users) t2
WHERE
  t.id = @id;
Devart
  • 119,203
  • 23
  • 166
  • 186
1

This avoids repeated evaluation of random() during the aggregation and should be faster, too, as min(id) and random() are only evaluated once.

SELECT u.* 
FROM   users u
      ,(SELECT min(id) AS min_id, max(id) AS max_id FROM users) x
WHERE  u.id > (floor((x.max_id - x.min_id + 1) * rand()) + x.min_id)
LIMIT  1;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0
SELECT * FROM users ORDER BY rand() limit 1;

Will return a random row.

Jim H.
  • 5,539
  • 1
  • 24
  • 23
0

How about:

SELECT * 
FROM users 
WHERE id >= (SELECT floor((max(id) - min(id) - 1) * rand())  + min(id) from users)
LIMIT 1

or similar?

Pretty much it should limit you to one result, and if it hits a gap (somehow), then it'll just take the next available one.

Gustav Bertram
  • 14,591
  • 3
  • 40
  • 65