2

I have a PHP file that looks for a random id from a MySQL DB, but when the table is big enough it gets slow. ID row has gaps.

Original

$sql = "SELECT * FROM definiciones ORDER BY rand() LIMIT 1";

Idea

$random = mt_rand(0, 10000);
$sql = "SELECT * FROM definiciones WHERE id = (SELECT max(id) FROM definitiones WHERE id < $random)";

I know the exact amount of rows in the DB beforehand. Is it a good idea to replace the original query?

Ricardo Mehr
  • 310
  • 1
  • 3
  • 12
  • Your question has peaked my curiosity, why are you ordering by a random number? Is it really necessary or isn't there a better way to do the entire thing? – Bobby Axe Oct 07 '18 at 23:13
  • With gaps in the id values, it is possible that you could get no results from the query if there were no remaining `id` values below some number. – Nick Oct 07 '18 at 23:15
  • @BobbyAxe It was made that way to get a random result, but it was a very ineficient way to achieve it. – Ricardo Mehr Oct 07 '18 at 23:16
  • @Nick You are right about that! – Ricardo Mehr Oct 07 '18 at 23:18
  • https://stackoverflow.com/q/1244555/2943403 , https://stackoverflow.com/q/4329396/2943403 – mickmackusa Oct 08 '18 at 00:07
  • Possible duplicate of [ORDER BY RAND() function taking long time to execute in mysql](https://stackoverflow.com/questions/26313771/order-by-rand-function-taking-long-time-to-execute-in-mysql) – mickmackusa Oct 08 '18 at 00:15

1 Answers1

2

Is it a good idea to replace the original query?

Yes, but there's a simpler way of expressing this:

SELECT * FROM definiciones WHERE id >= ? ORDER BY id LIMIT 1

With ? set to a random number between 0 and the maximum ID in the table.


Now, an improvement: If there are any gaps in the values of id, the results from the previous method will be skewed somewhat. (For instance, if there are no rows with id < 100, then a row with id = 100 will be selected much more often than one with id = 101.) You can avoid this by using a separate column for randomization. First, you will need to add the column:

ALTER TABLE definiciones ADD COLUMN randomval FLOAT NOT NULL,
                         ADD KEY randomval (randomval);
UPDATE TABLE definiciones SET randomval = RAND();

Then, to select a fairly chosen random item:

SELECT * FROM definiciones WHERE randomval > ? LIMIT 1;

using a random value between 0 and 1 for the parameter.

There is a small chance that this will return no rows (if RAND() selects a value greater than the highest value in the table). If this happens, repeat the query.

You will need to set randomval = RAND() when inserting new rows into the table.