6

which is more efficient (when managing over 100K records):

A. Mysql

SELECT * FROM user ORDER BY RAND();

of course, after that i would already have all the fields from that record.

B. PHP

use memcached to have $cache_array hold all the data from "SELECT id_user FROM user ORDER BY id_user" for 1 hour or so... and then:

$id = array_rand($cache_array);

of course, after that i have to make a MYSQL call with:

SELECT * FROM user WHERE id_user = $id;

so... which is more efficient? A or B?

Andres SK
  • 10,779
  • 25
  • 90
  • 152
  • Duplicate: http://stackoverflow.com/questions/1823306/alerternative-to-mysql-order-by-rand – OMG Ponies Mar 31 '10 at 15:35
  • 2
    @OMG Ponies: It's similar, but not a dupe. He's not asking for the best way to implement it within Mysql, he's asking for a comparison of two methods. – ryeguy Mar 31 '10 at 15:39
  • According to this graph (see link at end), ~100K sorted by RAND() is sub second. I'm curious to see if PHP (much less .NET or J2EE) will provide similar performance: http://stuff.dasprids.de/images/benchmark-random-row-full.png – OMG Ponies Mar 31 '10 at 15:51

3 Answers3

10

The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.

Having said that, ORDER BY RAND() is known to be slow because it's impossible for MySQL to use an index. MySQL will basically run the RAND() function once for each row in the table and then sort the rows based on what came back from RAND().

Your other idea of storing all user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you're adding some complexity to your application. I would try to look for another way.

I'll give you a third option which might outperform both your suggestions: Select a count(user_id) of the rows in your user table and then have php generate a random number between 0 and the result of count(user_id) minus 1, inclusive. Then do a SELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.

Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.

Asaph
  • 159,146
  • 25
  • 197
  • 199
  • this would work great, except that i forgot to specify that the records that can be selected must be filtered with a WHERE flg_pic=1. If i do a count(id_user) the records that have flg_pic=0 would also be considered :s – Andres SK Mar 31 '10 at 15:54
  • @andufo: This is not really a problem at all. You can do `SELECT COUNT(user_id) FROM user WHERE flg_pic=1;` and then `SELECT * FROM user WHERE flg_pic=1 LIMIT 1 OFFSET random-number-generated-by-php;`. – Asaph Mar 31 '10 at 15:57
  • FYI: I realized that the random number must be between `0` and `count(user_id) - 1` inlcusive, and *not* between `1` and `count(user_id)` as I originally stated. I updated my answer. Just wanted to bring that to your attention to avoid a bug. – Asaph Mar 31 '10 at 16:06
  • 2
    @Asaph wow... these are the results with 150k records: OFFSET took 0.0017 secs and BY RAND() took 1.6337 sec -- Thanks! – Andres SK Mar 31 '10 at 16:13
  • Unfortunately, this method won't work when dealing with a table that does not have continuous IDs. You could end up selecting a value that does not exist. – Benjam Aug 01 '11 at 17:07
  • 1
    @Benjam: You're wrong. This method _will_ work with non-continuous ids because it uses the `OFFSET` to choose the random row, _not_ a specific id. – Asaph Aug 01 '11 at 17:18
  • 1
    @Asaph, seems you are correct, thanks for enlightening me. I must have read it too fast. – Benjam Aug 01 '11 at 20:56
4

The first one is incredibly slow because

MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.

It's elaborated more on this blog post.

ryeguy
  • 65,519
  • 58
  • 198
  • 260
0
$random_no = mt_rand(0, $total_record_count);

$query = "SELECT * FROM user ORDER BY __KEY__ LIMIT {$random_no}, 1";
Sangwon Park
  • 454
  • 5
  • 4