I have a large result set (R1) (a few million rows) that I want to quickly select a sample set from.
I am using https://stackoverflow.com/a/25994039/601147 but it is far too slow.
Anybody know of a better solution?
Thanks
I have a large result set (R1) (a few million rows) that I want to quickly select a sample set from.
I am using https://stackoverflow.com/a/25994039/601147 but it is far too slow.
Anybody know of a better solution?
Thanks
If you have an auto-increment column you can query for the smallest and largest ID and let your application generate n
random IDs with lowest < random-ID < largest
that you can select using SELECT ... WHERE ... IN (<your-random-ids>)
.
The number of values in the IN
part is limited on MySQL to max_allowed_packet. Hence, you maybe have to split the list on more than one IN
-queries.
Also not all random IDs may exist, so you will have to check the number of rows returned and fill up the remaining list again with the same procedure until you have enough values.