-1

How can I get random posts without scanning the whole database.

As I know if you use MySQL ORDER BY RAND() it will scan the whole database.

If there is any other way to do this without scanning the whole database.

Dharman
  • 30,962
  • 25
  • 85
  • 135
zack
  • 484
  • 3
  • 9
  • 25
  • 2
    Use `ORDER BY RAND() LIMIT 1`.. – Joke_Sense10 Dec 04 '13 at 10:13
  • @Joke That will still sort the whole table, which is exactly what the OP doesn't want. – deceze Dec 04 '13 at 10:20
  • @deceze It doesn't if he is looking for single row result.. – Joke_Sense10 Dec 04 '13 at 10:26
  • @Joke It ***does*** sort the entire table first, then returns the first result from it! All `ORDER BY` clauses do so, otherwise you couldn't get the correct result. It's an expensive operation, it's well known that it's an expensive operation and it's the reason why the OP asked the question. – deceze Dec 04 '13 at 10:27
  • @deceze May be i am wrong but i came to the conclusion from the following reference: http://stackoverflow.com/questions/455476/does-adding-limit-1-to-mysql-queries-make-them-faster-when-you-know-there-will – Joke_Sense10 Dec 04 '13 at 10:28
  • @Joke Think about it here: `SELECT * FROM foo ORDER BY date DESC LIMIT 1`. All rows will have to be sorted first here so the one row that gets returned is the one with the latest date. Maybe that makes it more obvious than the rand example. – deceze Dec 04 '13 at 10:31
  • Not talkin abt `desc` here.It's obvious that if you go in descending order then it will query the entire table and that is the reason in my solution i did not mention descending.. – Joke_Sense10 Dec 04 '13 at 10:34
  • @Joke Forget `DESC`, it doesn't change anything. `ORDER BY ` always works the same way. `ORDER BY RAND()` works by generating a random number *for each row* which MySQL then sorts on. `ORDER BY date` uses the `date` value of each row for the exact same sorting algorithm. You can also do `ORDER BY id + 3`, `ORDER BY 'a'` or any other expression that results in a value that can be sorted. `ASC` or `DESC` just tell it in which direction to sort. – deceze Dec 04 '13 at 10:38

3 Answers3

2

A tiny modification of @squeamish ossifrage solution using primary key values - assumming that there is a primary key in a table with numeric values:

SELECT *
FROM delete_me
WHERE id >= Round(  Rand() *
     ( SELECT Max( id ) FROM test ))
LIMIT 1

For table containing more than 50.000 rows the query runs in a 100 miliseconds:

   mysql> SELECT  id, table_schema, table_name   
          FROM delete_me   
          WHERE id >= Round(  Rand() *         
                   ( SELECT Max( id ) FROM delete_me ))
          LIMIT 1;
    +-----+--------------------+------------+
    | id  | table_schema       | table_name |
    +-----+--------------------+------------+
    | 173 | information_schema | PLUGINS    |
    +-----+--------------------+------------+
    1 row in set (0.01 sec)
krokodilko
  • 35,300
  • 7
  • 55
  • 79
0

A lot of people seem to be convinced that ORDER BY RAND() is somehow able to produce results without scanning the whole table.

Well it isn't. In fact, it's liable to be slower than ordering by column values, because MySQL has to call the RAND() function for each row.

To demonstrate, I made a simple table of half a million MD5 hashes:

mysql> select count(*) from delete_me;
+----------+
| count(*) |
+----------+
|   500000 |
+----------+
1 row in set (0.00 sec)

mysql> explain delete_me;
+-------+------------------+------+-----+---------+----------------+
| Field | Type             | Null | Key | Default | Extra          |
+-------+------------------+------+-----+---------+----------------+
| id    | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| txt   | text             | NO   |     | NULL    |                |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.12 sec)

mysql> select * from delete_me limit 4;
+----+----------------------------------+
| id | txt                              |
+----+----------------------------------+
|  1 | 9b912c03d87991b71955a6cd4f81a299 |
|  2 | f1b7ddeb1c1a14265a620b8f2366a22e |
|  3 | 067b39538b767e2382e557386cba37d9 |
|  4 | 1a27619c1d2bb8fa583813fdd948e94c |
+----+----------------------------------+
4 rows in set (0.00 sec)

Using ORDER BY RAND() to choose a random row from this table takes my computer 1.95 seconds.

mysql> select * from delete_me order by rand() limit 1;
+--------+----------------------------------+
| id     | txt                              |
+--------+----------------------------------+
| 446149 | b5f82dd78a171abe6f7bcd024bf662e8 |
+--------+----------------------------------+
1 row in set (1.95 sec)

But ordering the text fields in ascending order takes just 0.8 seconds.

mysql> select * from delete_me order by txt asc limit 1;
+-------+----------------------------------+
| id    | txt                              |
+-------+----------------------------------+
| 88583 | 00001e65c830f5b662ae710f11ae369f |
+-------+----------------------------------+
1 row in set (0.80 sec)

Since the id values in this table are numbered sequentially starting from 1, I can choose a random row much more quickly like this:

mysql> select * from delete_me where id=floor(1+rand()*500000) limit 1;
+-------+----------------------------------+
| id    | txt                              |
+-------+----------------------------------+
| 37600 | 3b8aaaf88af68ca0c6eccff7e61e897a |
+-------+----------------------------------+
1 row in set (0.02 sec)

But in the general case, I would suggest using the method proposed by Mike in the page linked to by @deceze.

Community
  • 1
  • 1
r3mainer
  • 23,981
  • 3
  • 51
  • 88
  • 1. As you pointed out, the only answer provided in your post is inapplicable in the real life. 2. To demonstrate, one have to explain the query, not table. – Your Common Sense Dec 04 '13 at 11:05
0

My suggestion for this kind of requirement is to use an MD5 hash.

  1. Add a field to your DB table, CHAR(32), and create and index for it.
  2. Populate it for every record with an MD5 hash of anything (maybe the value from the ID column or just any old random number, doesn't matter too much as long as each record is different)
  3. Now you can query the table like so:

    SELECT * FROM myTable WHERE md5Col > MD5(NOW()) LIMIT 1
    

This will give you a single random record without having to scan the whole table. The table has a random sort order thanks to the MD5 values. MD5 is great for this because it's quick and randomly distributed.

Caveats:

  • If the MD5 from your SELECT query results in a hash that is larger than the last record in your table, you might get no records from the query. If that happens, you can always re-query it with a new hash.
  • Having a fixed MD5 hash on each record means that the records are in a fixed order. This isn't really an issue if you're only ever fetching a single record at a time, but if you're using it to fetch groups of records, it may be noticable. You can of course correct this if you want by rehashing records as you load them.
Spudley
  • 166,037
  • 39
  • 233
  • 307
  • If you're going to possibly have to use at least two queries anyway, `... LIMIT n, 1` where *n* ≤ `COUNT(*)` seems a lot more straight forward. Interesting approach though. – deceze Dec 04 '13 at 11:17