1

My question is based on the SO MySQL select 10 random rows from 600K rows fast. It is known

SELECT column FROM table
ORDER BY RAND()
LIMIT 10

is too slow for huge tables, another tricks and methods are used to extract some rows.

But what if I use WHERE:

SELECT column FROM table
WHERE colA=123
ORDER BY RAND()
LIMIT 10

What about the performance, if WHERE actually excludes at least 99.99% wrong rows among 600k? By another words, what works first in this query - WHERE or ORDER BY RAND()?

If WHERE works first, does this mean ORDER BY RAND () sort only 60 rows (not 600k) and works fast?

Community
  • 1
  • 1
Haradzieniec
  • 9,086
  • 31
  • 117
  • 212

2 Answers2

3

If this performs well (fast enough for you) and returns not many rows (say, less than a 1000):

SELECT column FROM table
WHERE colA=123 ;

Then this will perform well, too, because it will sort only the (less than 1000) rows of the previous query:

SELECT column FROM table
WHERE colA=123
ORDER BY RAND()
LIMIT 10 ;

If you want to be dead sure that it will perform pretty well, even if the first query returns many thousands or million rows, you can use this, which will limit the sorting to maximum 1000 (or a number of your choice) rows:

SELECT column
FROM
  ( SELECT column FROM table
    WHERE colA=123
    LIMIT 1000
  ) AS tmp
ORDER BY RAND()
LIMIT 10 ;

The downside is that if there are indeed many rows, the 1000-cut will be arbitrary and indeterminate but not random. It will probably be done based on the indexes used for the query.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
2

WHERE works first and yes, then ORDER BY gets less number of rows to sort, but more time was freed up by WHERE

Also run EXPLAIN On your query, that will also indicate why this speed or delay

Hanky Panky
  • 46,730
  • 8
  • 72
  • 95