2

Imagine a sql table with a very high number of rows (for example 300 million of rows).

Each row have an text field and I'd like to retrieve 10 rows randomly.

The problem is that using

select key from the_table where enabled=1 order by rand() limit 10

Calculate the rand value and order all the rows by that value is a very high load.

Is there some way for avoiding that?

Addev
  • 31,819
  • 51
  • 183
  • 302
  • 1
    possible duplicate of [quick selection of a random row from a large table in mysql](http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql) or http://stackoverflow.com/questions/1594483/optimizing-my-mysql-statement-rand-too-slow – Evan Mulawski Jun 01 '12 at 15:48
  • 1
    http://jan.kneschke.de/projects/mysql/order-by-rand/ – Tim Schmelter Jun 01 '12 at 15:48
  • 1
    +1 I love the question, so far all answer haven't been perfect, in your problem you warranted random without repeating, all answer repeating is possible... so there is not a solution yet @EvanMulawski – jcho360 Jun 01 '12 at 16:35

2 Answers2

1

If you can generate the keys outside of SQL (even if it takes using 15-20 to get 10 that are enabled) you can then skip the order() which is almost certainly what the problem is wrt to load

SELECT key WHERE key IN [x,y,z....]

where x,y,z are generated randomly and externally.

Even better, you can create a table of enabled keys, sorted randomly and just page through that in chunks of 10. You'd need to update it occasionally and there's a penalty in space but that's probably not THAT big a deal.

There's more sophisticated solutions, but that would get you moving.

John Paul Ashenfelter
  • 3,135
  • 1
  • 22
  • 29
1

I'd test (time) the possible solutions to verify they are indeed as slow as you suspect.

If your primary keys are sequential or otherwise predicatable, perhaps you could generate 10 random ones from code or as part of a procedure, then SELECT the 10 records by key.

For example, if your keys were sequential numeric IDs, then you'd generate 10 RAND numbers between Min(ID) and Max(ID) and use WHERE Id in (rand1, rand2, rand3 ...)

Widor
  • 13,003
  • 7
  • 42
  • 64
  • @wildor but it's possible that he could repeat values – jcho360 Jun 01 '12 at 16:16
  • @jcho360 depends on how the random numbers are generated. One way is to generate the whole set and randomise the order, then take the top `x` many results that you need. – Widor Jun 01 '12 at 16:20