How can I improve the speed of this query for getting a random row

Question

This query fetches a random Movie from a table of 500k records after some where clauses. This query's speed ranges from 0.016s to 0.450s. Can anyone see a way of improving it to its around the 0.016 more than the 0.450 mark?

SELECT  movie.ID, imdbID, Title, Y
ear, Rating, Runtime, Genre,  Metacritic,
imdbRating, imdbVotes, Poster,  FullPlot, 
Language,trailerUrl, type  
from moviedb.movie 
INNER JOIN (
    SELECT RAND()*(
        SELECT MAX(ID) 
        FROM movie
    ) AS ID) AS t 
ON movie.ID >= t.ID
WHERE year > 2004 
AND year < 2015 
AND imdbRating > 6.9 
AND imdbvotes > 9999.9   
ORDER BY movie.id LIMIT 1

@MarcB it is not a dupe. I am trying to improve a query which is similar to the top answer in that question but it different since that answer gets you random rows without concidering any where conditions. also i am not trying to get 10 rows, only one — 124697, Nov 05 '14 at 21:56
doesn't matter if it's not your exact query. it's the concepts in the answer that matter. — Marc B, Nov 05 '14 at 21:58

Gordon Linoff · Answer 1 · 2014-11-05T22:02:52.143

Your query is basically this:

SELECT m.*
from moviedb.movie m JOIN
     (SELECT RAND()*MAX(ID) as maxid
      FROM movie
     )  mm
     ON m.ID >= mm.max
WHERE m.year > 2004 AND m.year < 2015 AND m.imdbRating > 6.9 AND m.imdbvotes > 9999.9   
ORDER BY m.id
LIMIT 1;

I suspect the performance variation is caused by the order by. There may be little you can do to fix this, unless you can convince MySQL to use an index for the order by. Such an index would be id, year, imdbrating, imdbvotes. This is a covering index for the where and order by clauses, and MySQL might use it to avoid a filesort.

Another approach would be to introduce flags for the rating and vote counts. Then you could phrase the query as:

where m.year > 2004 and m.year < 2015 and RatingGreatFlag = 1 and LotsaVotesFlag = 1

Then an index on RatingGreatFlag, LotsVotesFlag, id, year would probably be a big help. However, maintaining these flags would probably require using triggers.

EDIT:

As I think about this problem, I wonder if removing the order by would still produce what you want. This would given an indeterminate row after the random id. There are definitely ways to choose such an indeterminate row that would be bad (such as the one with the highest id), but in practice it might work very well.

Thanks for that answer. removing the orderby seems to make a very slight improvement. But i am not sure if it is wise to do it, doesn't it impact on the randomness of the result? — 124697, Nov 05 '14 at 22:18
@code578841441 . . . The question is whether the disparity in time is due to the `order by` (when the random id is small) or due to the search for a matching record. — Gordon Linoff, Nov 05 '14 at 22:40

Bohemian · Answer 2 · 2014-11-05T22:16:43.050

0

A simple technique is to use a hashing function on some data in the row with a varying salt:

select blah, blah, blah
...
order by password(concat(id, unix_timestamp()))
limit 1

The varying salt ensures that each execution produces a different random row. You still need to add some data from the row to ensure each row is ordered differently to every other row.

edited Nov 05 '14 at 22:16

answered Nov 05 '14 at 22:09

Bohemian

412,405
93
575
722

This is slow. consistently getting around 0.250s – 124697 Nov 05 '14 at 22:19

How can I improve the speed of this query for getting a random row

2 Answers2