5

I saw many websites saying don't use ORDER BY RAND(), eg http://forge.mysql.com/wiki/Top10SQLPerformanceTips So I run a test, I have tested the speed and performance on 20k records table, 10k records among them have the username="username" :

SELECT username FROM testingspeed WHERE username='username' ORDER BY RAND();

The result :

Showing rows 0 - 29 (10,000 total, Query took 0.0119 sec).
id = 1 
select_type = SIMPLE
table = testingspeed
type = ref
posible_keys = username
key = username
key_len = 32
ref = const
rows = 3225
Extra = Using where; Using index; Using temporary; Using filesort

since it took 0.0119 seconds only to execute the query, it should be very good speed, why people still say DON'T use ORDER BY RAND()? Why 3225 rows are affected only? Why not 10,000 rows are affected?

zac1987
  • 2,721
  • 9
  • 45
  • 61
  • Remove the where cause, add 10M rows and try again. It will show how ORDER BY RAND() slower then alternatives. – J-16 SDiZ Jul 06 '11 at 07:17
  • You rarely retrieve a large amount of records with random order, you tipically do: `ORDER BY RAND() LIMIT 10`. Do test with these kind of retrieves and compare it with other solutions. – Karoly Horvath Jul 06 '11 at 07:21
  • I have found a number of references by doing a search on PHP RAND() that have specified the server speed goes down when you go over around 100k records, before this there are not really many complaints, just comments to not use RAND() in production unless it's for a smaller query, e.g. SELECT username FROM users WHERE gender='male' AND location='antarctica' – Ryan Jul 06 '11 at 07:26
  • @J-16 SDiZ, the performance is still good for the test that u suggested. Result is (30,000 total, Query took 0.0296 sec) and Explain shows 3000 rows are affected. – zac1987 Jul 06 '11 at 07:31
  • @Ryan, for my case, I need to random select to show 15 pictures of friends on the user's profile, so there is a WHERE clause like WHERE username='zac1987", and every user can has 5000 friends only. U said 100k records only got problem, so my 5k records 100% won't have problem right? – zac1987 Jul 06 '11 at 07:34
  • @yi_H, sorry about my careless. So as long as I have WHERE clause to select less than 5000 records, I won't have any problem with ORDER BY RAND() anymore, because 10M records will only causing problem... – zac1987 Jul 06 '11 at 07:38
  • Per the documents I found, for a small query with few results you should be fine but it generally isn't recommended. Why don't you programmaticly create a random number generator, pick 15 numbers between 1 and 5,000 (in this case) then just display these records based on what number record they are in the count? – Ryan Jul 06 '11 at 07:45
  • @Ryan, there are many numbering holes, sometime user delete friends, the ids will skip for other users when other users add new friends... I have read many times about how to select records on numbering holes, it is very hard for me, I don't understand and i don't know how to do it, eg this page http://jan.kneschke.de/projects/mysql/order-by-rand/ – zac1987 Jul 06 '11 at 07:51
  • That's why you would count the records, for example you wouldn't select the IDs as RAND() does but count this is the 5th existing record so this is 1 out of 15, this is the 39th existing record so it is now 2 out of 15 and so on until you have your 15 items to display, this will miss the ID gaps because it isn't trying to find the IDs that no longer exist. Just an idea and untested. – Ryan Jul 06 '11 at 07:58
  • if there are 10k users, the count number should return 10,000. So i need to use php for loop to test start from id=1 until id=10000 to see which id match to the existing record in database, the for loop will stop at 15th match... Is it what u mean? – zac1987 Jul 06 '11 at 08:04
  • @Ryan, i guess i understood ur concept already... cannot use for loop, must use Do While loop... Do while id < 15 matches... Thank you. Anyway, it will be very hard to find a match if there are too many users... imagine so many other user make friends to each other, 10k users mean 10000x harder to find a match. – zac1987 Jul 06 '11 at 09:26
  • You sort of have the idea but not exactly, you would have a loop, that would count the records and match to what number that record is in the count and not worrying about any IDs – Ryan Jul 06 '11 at 16:03

1 Answers1

5

The problem of ORDER BY RAND() is that as your explain tells you the "Using temporary" and the "Using filesort". For each request a temporary table is created and sorted. Thats a pretty heavy operation. It will probably not matter when your database is not under heavy load but it will cost a lot of performance.

fyr
  • 20,227
  • 7
  • 37
  • 53
  • "when your database is not under heavy load", meaning if there are 10k users are active on my website, the database is consider under heavy load? Thus ORDER BY RAND() will cause problem during that moment? May I know what problem will occur during that moment? – zac1987 Jul 06 '11 at 07:40
  • If I query "SELECT id, COUNT(*) FROM testingspeed", explain didn't tell "Using temporary" and didn't tell "Using filesort" too. So If I use the query to save all ids into a php array, then php random select 15 ids from php array, it should solve the problem right? – zac1987 Jul 06 '11 at 07:46
  • Thats right it solves the problem in a different way. But instead of holding a complete array in your memory per user process it would be better to precompute fixed sets by a single process to deliver them cached. – fyr Jul 06 '11 at 07:59
  • Sorry that I don't understand the sentence "precompute fixed sets by a single process to deliver them cached", can u explain to me in more detail about it please ? – zac1987 Jul 06 '11 at 08:07
  • You implement a script which gets e.g. this array holding the 10000 entries. You select 15 random ids 10 times. Then you save these 10 subsets maybe to disk and deliver them randomly to your enduser. – fyr Jul 06 '11 at 08:25
  • A user can has maximum 5000 friends only, so array max entries should be 5000 only, not 10000. I guess what u trying to do is... u save the 10 sets of 15 random ids in a new table in database. Then random select and show a set to the end user, so everytime user refresh the page, php array don't need to re-fetch 5000 entries again? – zac1987 Jul 06 '11 at 08:35
  • Thats one aspect the other one is that the php script needs memory to hold a set/array of 5000 entries. – fyr Jul 06 '11 at 08:37
  • I see. Thank you very much for the info, I will implement your suggested way :) – zac1987 Jul 06 '11 at 08:43
  • How to determine when to re-compute the fix set? Eg, When the user has 50 friends, compute the fix set for 1st time. Then when user has 200 friends, computer fix set for 2nd time. Firstly, I need to keep query count(*) to get total of friends. Then php `if($fren_num > 50 AND $fren_num < 200){ //compute fixed set } elseif ($fren_num < 500){ //compute fixed set } elseif ($fren_num < 5000){ //compute fixed set };` The problem is... it will keep computing the fixed set when the user has 50 friends, 51 friends, 52 friends... it will stop compute fixed set only when the user has 5000 friends. – zac1987 Jul 06 '11 at 09:06
  • Okay, I have figured a solution, happy!!! Just create a new field "got_set" onto the table in database, then add another condition onto the if else statement. Eg, if($fren_num > 50 AND $fren_num < 200 AND $got_set != "50") – zac1987 Jul 06 '11 at 09:35