0

Initially, the query was fast but as the rows increased the performance degraded. My table consists of 600K rows and now we face a major performance issue with the Rand() function.

SELECT id,postalCode,location 
from jobs_feed 
WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
ORDER BY RAND() LIMIT 1;
Schecher_1
  • 343
  • 4
  • 12
Chowdary
  • 1
  • 4
  • 1
    Please explain why you think this is an issue with the RAND function, because I think the issue is with the number of records. (and maybe a missing index? Can you add the output of `SHOW CREATE TABLE jobs_feed ` ? ) – Luuk May 22 '22 at 08:25
  • Yes, this question has been asked before, see https://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql, or https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast – Shadow May 22 '22 at 08:32
  • Tip: `is_updated != 0` is not [_sargable_](https://en.wikipedia.org/wiki/Sargable) . Change to `is_updated = 1` (if that is the only other value). – Rick James Apr 27 '23 at 02:52

3 Answers3

0

MySQL needs to determine a random number of all 600K+ rows, just to determine which row comes first.

A possible work around can be:

SELECT id,postalCode,location
FROM (
   SELECT id,postalCode,location 
   from jobs_feed 
   WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
   ORDER BY id LIMIT 1000 )x
ORDER BY RAND() LIMIT 1;

This will first select the first 1000 records, and then pick a random row from those.

NOTE: This behavior is different from that you where doing, but faster because only the first 1000 records need to be fetched.

NOTE2: You can change 1000 to another (bigger) value, as long as you think performance is OK (and this alternative way of selecting is OK.)

EDIT: An alternative approach would be to do a dynamic (read RANDOM) offset, like given in this answer: https://stackoverflow.com/a/5873705/724039

This should be done in a stored procedure, and would look something like this:

CREATE DEFINER=`root`@`localhost` PROCEDURE `getRandomRow`()
BEGIN
    DECLARE offset BIGINT;
    SET offset = RAND()*600000; -- change this to get the number of records from your table, and not the fixed 600000
    SELECT id,postalCode,location 
       from jobs_feed 
       WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
     ORDER BY id LIMIT offset,1;
END
Luuk
  • 12,245
  • 5
  • 22
  • 33
  • Thanks, Luuk, but I'm looking to fetch a random id from whole 600K records. – Chowdary May 22 '22 at 08:55
  • That restricts you to the first (or last) 1000 rows. It is not practical to use this technique to sometimes get rows from the 'middle' of the table. – Rick James May 23 '22 at 01:01
  • True, that why I wrote the NOTE, because yesterday I did not have enough time to think if one could add a dynamic (=random) OFFSET. An answer using dynamice OFFSET is given here: https://stackoverflow.com/a/5873705/724039 With some change it could be changed to random OFFSET with a LIMIT of 1 record. – Luuk May 23 '22 at 17:43
0

Add a composite index as already discussed, then pick the appropriate algorithm from this set: Random

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Rick, is this page appropriately closed? I don't see your advice to add a composite index on the other mentioned pages. Either I am missing something, or this page should be reopened, or your advice should be found on the dupe target. No? – mickmackusa Apr 26 '23 at 06:32
  • @mickmackusa - It would be lost in all the Answers; it's the first Comment on the Question: `Here's 8 techniques; perhaps one will work well in your case. – Rick James Jul 5, 2015 at 15:39 ` – Rick James Apr 27 '23 at 02:45
  • Well, I rewrote it to get rid of the inefficient ones -- There are only 5 now in http://mysql.rjweb.org/doc.php/random – Rick James Apr 27 '23 at 02:46
  • I scoured this site when I was developing my blog. Nearly all are inefficient when you have a big table. – Rick James Apr 27 '23 at 02:48
-1

There isn't much way to optimize ORDER BY RAND() LIMIT 1. But we can try adding the following index to speed up the WHERE filter:

CREATE INDEX idx ON jobs_feed (check_parsed, similarjob_status, is_updated);

This index, if used, can potentially allow MySQL to discard many records from the query pipeline, leaving behind only a small subset to randomly shuffle.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360