0

I have a table with 3,000,000 records.I tried to randomly extract 300,000 records using the following method,but it takes about 7 minutes.

SELECT * FROM mytable WHERE `class`='faq' ORDER BY RAND() LIMIT 300000

I want to improve the speed of random extraction, what should I do? Mysql version is 5.6.

wangtianye
  • 306
  • 1
  • 5

1 Answers1

0

The cost is most likely due to sorting all the matching data. You don't specify how many rows match the condition, so this sort is likely to be some fraction of 3,000,000 rows.

If you can deal with approximately 300,000, you can use sampling logic in the WHERE clause:

SELECT t.*
FROM mytable t CROSS JOIN
     (SELECT COUNT(*) as cnt
      FROM t
      WHERE class = 'faq'
     ) x
WHERE t.class = 'faq' AND
      rand() < (300000 / cnt);

To be more precise, you can take a slightly larger random sample and then use order by/limit:

SELECT t.*
FROM mytable t CROSS JOIN
     (SELECT COUNT(*) as cnt
      FROM t
      WHERE class = 'faq'
     ) x
WHERE t.class = 'faq' AND
      rand() < (300000 / cnt) * 1.1
ORDER BY rand()
LIMIT 300000;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I am of a long held understanding that MySQL `RAND` is a very inefficient and slow process and where at all possible it's better to simply avoid it and / or pass the full data set to the output and let the output (PHP, etc.) pick random results in a faster, more efficient manner. Maybe MySQL 8.0 revolutionised this? hah – Martin Jul 30 '19 at 11:52
  • @Martin . . . Sorting is expensive. I'm not aware that `rand()` is "very inefficient". In general, it is better to reduce the amount of data passed to applications and `rand()` is one way to do that. – Gordon Linoff Jul 30 '19 at 12:21