Random results. Which way is faster SQL-query or a flat file?

Question

I need to display some random data (1 row) from MySQL table and performance is very important for me. It doesn't necessarily has to be absolutely unique and super random data, so I have two options:

1) Query the database (my table has > 500 000 rows);

2) Automatically once a week create a flat text file (or php file with an array) from db results with, let's say, 400-500 rows and get some random results from it (probably using requre_once or something like that).

Which way is better/faster?

Thank you.

Some detailed discussion on the issue: http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/ — Alex Brasetvik, Jan 03 '10 at 19:32

score 2 · Answer 1 · answered Jan 03 '10 at 19:32

2

Query the database.

BUT not by querying

SELECT * FROM tablename ORDER BY rand() LIMIT 1

as this assigns a random number to every row, and then returns the maximum.

Instead, count the number of rows, then get a random number within this value and return that row

$numrows = mysql_num_rows();
$r = rand(0, $numrows-1);
$sql = "SELECT * FROM tablename LIMIT $r, 1";

answered Jan 03 '10 at 19:32

Adam Hopkinson

28,281
7
65
99

Note that counting the number of rows is slow with InnoDB. – Alex Brasetvik Jan 03 '10 at 19:34
`mysql_num_rows()` returns the number of rows in a result set, not the number of rows in a table. – Bill Karwin Jan 03 '10 at 19:38

score 1 · Answer 2 · answered Jan 03 '10 at 19:37

1

Definitely don't use the trick many people use:

SELECT * FROM MyTable ORDER BY RAND() LIMIT 1;

That query looks simple but it is sure to be a performance killer.

This might be a quick solution:

SELECT * FROM MyTable WHERE id > RAND() * (SELECT MAX(*) FROM MyTable) LIMIT 1;

This has some anomalies, such as picking rows that follow gaps more frequently. But you said you want fast, not accurate. Note that aggregates like MAX() and COUNT() are slower when using transactional tables like InnoDB, and faster when using MyISAM.

answered Jan 03 '10 at 19:37

Bill Karwin

538,548
86
673
828

i believe your second query would be even slower than your first, as it is generating a rand and counting the number of values in the table for every single row – Adam Hopkinson Jan 03 '10 at 19:42
@Adam: You'd think so, but in reality the 2nd query is more scalable - see: http://stackoverflow.com/questions/1823306/alerternative-to-mysql-order-by-rand – OMG Ponies Jan 03 '10 at 19:46

score 0 · Answer 3 · answered Jan 03 '10 at 19:42

There's plenty of material on how to efficiently get a random result with your query on this site and Google, so that should take care of your "how to properly write the query" question.

To your second part, if you want the same result for the whole week/day/hour, then you could easily write a cronjob that takes the result, writes it to a file, and have your application pull from that file. The next time the cronjob runs it will overwrite the old version of the file, giving your application the new result. This cronjob could even generate your HTML (or whatever) and put the static page in your web hierarchy, allowing you to benefit from your web server's caching. The web server's caching should negate the disk versus database I/O question, and may end up helping you if your database is usually under heavy load.

score 0 · Accepted Answer · answered Jan 03 '10 at 20:19

0

If your table has an incremental unique ID then just

SELECT * FROM table WHERE id = $r

with $r your unique number got by the suggestions above.

answered Jan 03 '10 at 20:19

Gazzer

4,524
10
39
49

Random results. Which way is faster SQL-query or a flat file?

4 Answers4