1

I have a website project that I'm currently coding and I have a table storing information on cars. The table has information such as below, there is about 20 columns in the table, I just want to know what will be the fastest way to get all the column data from a row randomly. I know there is a " ORDER BY RAND() " but I have read that it's slow to use and I fear because of the large table that this may cause problems, is there an alternative.

I'm going to use MySQL with InnoDB but this can be changed if selecting random rows can be done more efficiently with an alternative database engine. The website will be coded using PHP(Potentially RubyOnRails) and the database will be filled from a Python Script that will scrape car information websites.

Table contains rows such as

CarTable -

  • Car Name
  • Car Manufacturer
  • Door Numbers
  • ...
  • ...
  • ...
  • See if the following post answers your question: http://stackoverflow.com/questions/1823306/mysql-alternatives-to-order-by-rand – Raj May 23 '13 at 00:13

3 Answers3

1

If the performance of ORDER BY RAND() is too slow, you can do it with two queries. First, do:

SELECT COUNT(*) c FROM CarTable

Then select a random integer between 0 and c-1, in PHP that's:

$rownum = rand(0, $c);

Then, do another query:

SELECT * FROM CarTable
LIMIT :rownum, 1
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • This solution has performance problem. But, i think, it will be fine for 10k rows. – sectus May 23 '13 at 00:33
  • It should be better than `ORDER BY RAND()`. It's O(n) instead of O(n log n). – Barmar May 23 '13 at 00:34
  • Sure. But read [Why does MYSQL higher LIMIT offset slow the query down?](http://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down) – sectus May 23 '13 at 00:45
  • @sectus But ORDER BY RAND() is worse, since it has to read _every_ record (not just the first `$rownum`) AND it has to sort them. To do better, you need to use a DB with `ROWNUM()`, but MySQL doesn't have that. – Barmar May 23 '13 at 00:59
1

You can try this way.

Find out you max id of the table.

 SELECT id FROM car ORDER BY id DESC LIMIT 1;
 -- This result can be cached

Compute rand value between smallest id(or just 1) and max id.

$rand_id = mt_rand(1, $max_id);

Select the nearest row.

SELECT * FROM `car` WHERE id >= ? ORDER BY id LIMIT 1;

But if table has "holes" result would not be "true random". But, I think, it's normal in your case.

sectus
  • 15,605
  • 5
  • 55
  • 97
  • 1
    I would suppose `SELECT MAX(id) FROM car` might be the better bet, but if you're caching it, then shouldn't be too much of an issue. – Brian Moore May 23 '13 at 00:31
  • The main problem that I think may occur is that if I have ten records and delete record 7 and 8 and generate a random number from 0 to my max(10) what happens when 7 or 8 are selected? – Bratislava Bob May 23 '13 at 00:53
  • @BratislavaBob, it will be 9 for 7,8,9. If you need "true random" you should use Barmar solution. It will be faster than ORDER BY RAND(), but slower than my. – sectus May 23 '13 at 01:00
  • Right, this version will be biased towards the elements immediately after gaps, proportional to the size of the gap. – Barmar May 23 '13 at 01:02
0

In Rails, you could do something like this:

class MyModel < ActiveRecord::Base
  def self.random_record
    find :first, offset: rand(count)
  end
end
Chris Cashwell
  • 22,308
  • 13
  • 63
  • 94