0

I've read a post about how to randomly select a row (with certain conditions) from a SQL table with huge data. They get the max and min of id and generate a random number between them, and get the first row that has a bigger id than that. However, my ids are not distributed evenly, so I didn't get the truly random row. For example if my ids are 1, 100, 101, I'll have little chance to get the two later rows.

But I think of another solution. Instead of getting the max id, I count all the row in my query, get a random number i and select the ith one. The code looks like this

$count_res = $mysqli->query("SELECT COUNT(*) FROM quest WHERE category IN ({$mem['my_cate']})");
$count = $count_res->fetch_array();
$rand_id = rand(0, $count[0] - 1);

$result = $mysqli->query("SELECT * FROM quest WHERE category IN ({$mem['my_cate']}) LIMIT 1 OFFSET $rand_id");

However, i doubt it's effectiveness. Can anyone give me idea about that, or suggest a solution for my case. Thanks.

dvtrung94
  • 43
  • 4
  • 1
    I researched this for a while when I needed a similar query, and this was the same approach that I ended up going with. If you are using MYISAM, the count() will be quick, and you're not using ORDER BY which will help. Have you done some benchmarking? – markdwhite May 26 '13 at 02:40
  • 1
    Using a large `OFFSET` will incur delays, because MySQL has to actually iterate through the result set to get to the requested position (unlike selecting the first ID larger than a given value, where it can examine the index to find the proper row). – Amber May 26 '13 at 02:43
  • @markdwhite: no, i've tested only on my small size table. – dvtrung94 May 26 '13 at 02:59
  • @Amber: so do we have a solution for it? – dvtrung94 May 26 '13 at 02:59
  • @dvtrung94 Not really, beyond adding an indexed and evenly-distributed id column to your data. – Amber May 26 '13 at 03:06
  • Check this out... http://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast lots of good resources. – Orangepill May 26 '13 at 03:08
  • Does MySQL allow something like SELECT ... WHERE RAND() < 1/$count LIMIT 1? Has the downside of potentially selecting no records, though that's quite unlikely, and you can just repeat until you get a result. – Sysyphus May 26 '13 at 03:15
  • @dvtrung - running a big set of test data will be the proof of which one of these solutions best fits your needs. – markdwhite May 26 '13 at 03:46

2 Answers2

1

Alright, I've done some benchmarking. I've created a table with only one column an auto increment id. I then added 1,700,000 records. Since there is only one column I imagine it will be faster than in practice but here is my benchmarking:

Method 1: Select a count of rows and then using PHP to pick a random number and then select based on an offset. (I rigged the offset to be at the end of the table since it would be slower than the beginning fo the table).

Select count: 12ms

Select with offset: 513ms

Total: 525ms

Method 2: Selecting 1 with a RAND() on the whole table.

Total: 2,190ms

WINNER = METHOD 1

Possible Method 3: This is kinda just something I thought up and it wouldn't necessarily work in all cases. So the idea is you get the last auto increment id in the table, generate a random number between 1 and the last auto increment number, then select the first row that is greater than or equal to that id number. You have to do greater than or equal to because it's possible to have missing id numbers.

Select last id: 10.1ms

Select random row: 6.3ms

Total: 16.4ms

chrislondon
  • 12,487
  • 5
  • 26
  • 65
  • Thanks very much. Unfortunately, I can't make distribution of id be even (since I select rows from the table with some conditions). Maybe I'll consider using both methods, based on the size of returned table. – dvtrung94 May 27 '13 at 01:47
  • That's interesting, I didn't consider that with Method 3 the bigger the gap the greater the odds of the row by the gap to be selected – chrislondon May 27 '13 at 04:41
-2

It might be quicker to use something like:

$result = $mysqli->query("SELECT * FROM quest WHERE category IN ({$mem['my_cate']}) ORDER BY Rand() LIMIT 1");

Because it is only using one query and you can skip that top bit. You can benchmark it by trying both ways in a loop a few thousand or whatever number you decide times and compare the microtime() before and after the loop.

  • 4
    order by rand() is probably the most inefficient way to tackle this one. To order by rand the database has to walk the entire dataset and assign a random number to each record just so that it can sort it. – Orangepill May 26 '13 at 03:11