I imagine a table with, say, a million entries. You want to pick a row randomly, so you generate one random number per row, i.e. a million random numbers, and then seek the row with the minimum generated number. There are two tasks involved:
- generating all those numbers
- finding the minimum number
and then accessing the record of course.
If you wanted more than one row, the DBMS could sort all records and then return n records, but hopefully it would rather apply some part-sort operation where it only detects the n minimum numbers. Quite some task anyway.
There is no thorough way to circumvent this, I guess. If you want random access, this is the way to go.
If you would be ready to live with a less random result, however, I'd suggest to make ID buckets. Imagine ID buckets 000000-0999999, 100000-1999999, ... Then randomly choose one bucket and of this pick your random rows. Well, admittedly, this doesn't look very random and you would either get only old or only new records with such buckets; but it illustrates the technique.
Instead of creating the buckets by value, you'd create them with a modulo function. id % 1000
would give you 1000 buckets. The first with IDs xxx000, the second with IDs xxx001. This would solve the new/old records thing and get the buckets balanced. As IDs are a mere technical thing, it doesn't matter at all that the drawn IDs look so similar. And even if that bothers you, then don't make 1000 buckets, but say 997.
Now create a computed column:
alter table mytable add column bucket int generated always as (id % 997) stored;
Add an index:
create index idx on mytable(bucket);
And query the data:
select *
from mytable
where bucket = floor(rand() * 998)
order by rand()
limit 10;
Only about 0.1% of the table gets into the sorting here. So this should be rather fast. But I suppose that only pays with a very large table and a high number of buckets.
Disadvantages of the technique:
- It can happen that you don't get as many rows as you want and you'd have to query again then.
- You must choose the modulo number wisely. If there are just two thousand records in the table, you wouldn't make 1000 buckets of course, but maybe 100 and never demand more than, say, ten rows at a time.
- If the table grows and grows, a once chosen number may no longer be optimal and you might want to alter it.
Rextester link: http://rextester.com/VDPIU7354
UPDATE: It just dawned on me that the buckets would be really random, if the generated column would not be based on a modulo on the ID, but on a RAND
value instead:
alter table mytable add column bucket int generated always as (floor(rand() * 1000)) stored;
but MySQL throws an error "Expression of generated column 'bucket' contains a disallowed function". This doesn't seem to make sense, as a non-deterministic function should be okay with the STORED
option, but at least in version 5.7.12 this doesn't work. Maybe in some later version?