1

I have a query which consists of joins and has an output of about 30,000+ records. I would like to get 10 random rows from these records without using SQL command ORDER BY rand().

I tried looping through the records and put them in an array and shuffle (get the 1st 10 after shuffle) them but it takes about 8-12 seconds to generate. I would like to reduce this processing time at all cost.

How will I accomplish this?

acidpaul
  • 165
  • 2
  • 14
  • And what's your current query? – Tom Walters Mar 15 '13 at 14:59
  • And why don't you want to use rand()? – creinig Mar 15 '13 at 15:00
  • I'm not sure it's possible to get random rows from only the query without using rand. Are you using MySQL? – chawkinsuf Mar 15 '13 at 15:01
  • So - lets get this right - you want a "random" rows without using a function that generates random numbers. You can move the problem from SQL or to PHP or to ... Anyway you will have the same hurdle. – Ed Heal Mar 15 '13 at 15:04
  • I did some searching -- http://stackoverflow.com/a/12852040/1814922 has some pretty intriguing answers. Haven't tried them though. – creinig Mar 15 '13 at 15:20
  • I tried popnoodles' method and cut off some 3-4 seconds generation time. I believe doing the random in PHP will give a better loading time compared to ORDER BY rand() in MYSQL. – acidpaul Mar 15 '13 at 17:16

1 Answers1

8

Ok if you don't want to use RAND() do a much faster select of just the id column first then select only the 10 rows you want

  1. Select the IDs into $idarray
    SELECT id FROM table
  2. Shuffle and slice $idarray
    shuffle($idarray);
    $ids=array_slice($idarray,0,10);
  3. Select the complete rows
    $sql="SELECT ... WHERE id IN (".implode(', ', $idarray).")";

Edit: This is certainly much faster than using ORDER BY RAND()!

Popnoodles
  • 28,090
  • 2
  • 45
  • 53
  • 1
    Fetching 30000+ IDs (and growing) from the database, shuffling them and then throwing away 29990+ of them will certainly be much slower than letting the RDBMS doing the randomizing... – creinig Mar 15 '13 at 15:08
  • I've just tested selecting that amount of IDs from a database and ORDER BY RAND() LIMIT 10 of *just the id column* takes 30+ times longer than without ORDER BY and LIMIT. – Popnoodles Mar 15 '13 at 15:11
  • ... selecting 30 columns using ORDER BY and LIMIT takes over 1000 times longer. PHP takes 0.003 seconds to shuffle and slice 30,000 ids. – Popnoodles Mar 15 '13 at 15:28
  • Thanks for you response. I tried your method and saved me 3-4 seconds generation time. This may sound silly but is there a method to just pick some rows in the MYSQL result without looping through the records one by one and also of course not using ORDER BY rand()? – acidpaul Mar 15 '13 at 17:19
  • Not that I'm aware of. You have RAND() and LIMIT, unlike PHP there isn't usually more than one way to do something. – Popnoodles Mar 15 '13 at 19:50
  • I think your idea is the best way for now. I try to improve my query so that the performance will improve too. Thanks for the help. Much appreciated. – acidpaul Mar 18 '13 at 13:18