1

I have this query:

$sql = "SELECT 
    catalogs_values.name as word, catalogs.name as catalog
FROM
    catalogs_values
        INNER JOIN
    catalogs ON catalogs_values.catalog_id = catalogs.id
WHERE
    catalogs_values.id NOT IN (SELECT 
            valueid
        FROM
            monitor
        WHERE
            userid = $user_id)
        AND catalogs_values.checked = 0
ORDER BY RAND()
LIMIT 1";

In my table I have about 1 million records and my query is very slow. Can you suggest some improvements to it?

Abdul Manaf
  • 4,768
  • 3
  • 27
  • 34
torayeff
  • 9,296
  • 19
  • 69
  • 103

5 Answers5

1

Ordering by Mysql's RAND is ALWAYS slow, I use a very fast way to sort this:

  1. select min and max ID from the table
  2. run a php rand(min_Id, max_Id)
  3. make a loop until you find an id that's really existing, believe me it's really fast if the id is a unique key as it should
  4. once you are sure the ID exists exit the loop, there you have your random ID

    $SQL = " SELECT MAX( id ) as x FROM table ";
    $query = mysql_query($SQL);
    $x = mysql_fetch_assoc($query);
    $max = $x['x'];
    $ok = false;
    
    while($ok == false){
    
        $id = rand(1, $max);
        $SQL = "SELECT id FROM table WHERE id = ".$id." LIMIT 1";
    
        $query = mysql_query($SQL);
        $record = mysql_fetch_assoc($query);
    
        if((int)$record['id'] > 0){
            $ok = true;
        }
    
    }
    
    //your ID is: $record['id'];
    
sathia
  • 2,192
  • 2
  • 24
  • 42
  • 2
    `RAND()` isn't slow. Ordering on `RAND()` can be. Also, getting min and max and picking a random value in between would not have fair distribution if there were any gaps in the numbering. – Marcus Adams Feb 17 '14 at 13:28
  • you are right about RAND being slow on ordering if there are gaps a better approach would be to make as wikipedia does, if I remember correctly there's a column with an unsigned int that's used only for random lookups – sathia Feb 17 '14 at 13:42
1

MediaWiki (think Wikipedia's random article page) does this by assigning every row a random value, adding that to an index, and then using an indexed select:

SELECT * from `some_table` where `my_rand_column` >= RAND() LIMIT 1;
che
  • 12,097
  • 7
  • 42
  • 71
  • It can give zero result, event if there exists some unchecked rows – torayeff Feb 17 '14 at 13:31
  • That's right. I've been peeking into its [source](https://git.wikimedia.org/blob/mediawiki%2Fcore/a3af8ab9199a27a00ea1a9bc509f6cf0bdce46b0/includes%2Fspecials%2FSpecialRandompage.php#L134) for a while, and it looks like they simply ignore the possibility (or always have a fallback page with max value). – che Feb 17 '14 at 14:05
0

Try replacing the not in with a left outer join or not exists:

SELECT cv.name as word, c.name as catalog
FROM catalogs_values cv INNER JOIN
     catalogs c
     ON cv.catalog_id = c.id LEFT JOIN
     monitor m
     on cv.id = m.valueid and userid=$user_id
WHERE m.valueid is null and cv.checked = 0
ORDER BY RAND()
LIMIT 1;

That might solve the performance problem.

If it doesn't you might need another way to get a random row. One simplish approach is to select a subset of random rows and then choose just one:

select word, catalog
from (SELECT cv.name as word, c.name as catalog
      FROM catalogs_values cv INNER JOIN
           catalogs c
           ON cv.catalog_id = c.id LEFT JOIN
           monitor m
           on cv.id = m.valueid and userid=$user_id
      WHERE m.valueid is null and cv.checked = 0 and rand() < 0.001
     ) t
ORDER BY RAND()
LIMIT 1;

The inner query chooses about one in one thousand rows (the proportion may need to change depending on how many rows match the various other conditions). This reduced set is then passed to the order by rand() method for choosing just one.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I would avoid to use rand() on any production server with more than 1k rows, mysql rand() is one of the worst thing you can do to kill your server, it will make a full scan even on indexes. – sathia Feb 17 '14 at 13:25
  • 1
    @sathia . . . I don't think the issue is with the pseudo-random number generator itself. The performance issue is in using `order by rand()`. And then only when there is a fair amount of data to be ordered (which might be 1,000 rows or 10,000 rows or 100,000 rows, depending). – Gordon Linoff Feb 17 '14 at 13:27
  • I receive this error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE m.valueid is null and cv.checked = 0 ORDER BY RAND() LIMIT 1' at line 8 – torayeff Feb 17 '14 at 13:29
  • @torayeff . . . Sorry about that. I left part of your original `where` clause in the statement. – Gordon Linoff Feb 17 '14 at 13:45
0

you can take a random number like this:

$whoToTake = rand(1, 1000000);

or if you want it to be dynamic, do a simple count

select count(id) from table;

$max = THE_RESULT_OF_THE_QUERY
$whoToTake = rand(1, $max);

now do a simple query

select * 
from table
limit 1 offset {$whoToTake}

this will be much much faster

Tzook Bar Noy
  • 11,337
  • 14
  • 51
  • 82
0

This is a recurring question on #mysql at freenode IRC.

Take a look at this blog post by Jan Kneschke.

It outlines how you can optimize the ORDER BY RAND() LIMIT 1 so you don't have to first order all rows, and then throw everything away, except for that one row you're keeping.

Jan explains this in more detail than I can here without totally ripping off his blog post.

mrjink
  • 1,131
  • 1
  • 17
  • 28
  • 2
    [How to Answer](http://stackoverflow.com/questions/how-to-answer): ***Provide context for links*** _A link to a potential solution is always welcome, but please add context around the link so your fellow users will have some idea what it is and why it’s there._ ***Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline.*** – BackSlash Feb 17 '14 at 13:42
  • This is more of a comment, than an answer – Alexander Feb 17 '14 at 13:44
  • @BackSlash I edited the post (and +1'd your rightful comment). – mrjink Feb 17 '14 at 13:53