-1

I have a large result set (R1) (a few million rows) that I want to quickly select a sample set from.

I am using https://stackoverflow.com/a/25994039/601147 but it is far too slow.

Anybody know of a better solution?
Thanks

dewijones92
  • 1,319
  • 2
  • 24
  • 45

1 Answers1

0

If you have an auto-increment column you can query for the smallest and largest ID and let your application generate n random IDs with lowest < random-ID < largest that you can select using SELECT ... WHERE ... IN (<your-random-ids>).

The number of values in the IN part is limited on MySQL to max_allowed_packet. Hence, you maybe have to split the list on more than one IN-queries.

Also not all random IDs may exist, so you will have to check the number of rows returned and fill up the remaining list again with the same procedure until you have enough values.

siom
  • 1,610
  • 1
  • 14
  • 21