How to randomly select some, say 1000, values from a specific columns in a very large table in MySQL?

Question

I have a very large table in MySQL database, which has a columns names exa_id and the number of rows of this table is more than 10,000,000. I want to randomly and efficiently select only 1000 of the data through pandas.read_sql statement in Python. How can I write the code?

The SQL select ext_id from table_name order by rand() limit 1000 performs really bad, I'd like to another way.

One more explanation is that the contents of column exa_id are strings, like 'uudjsx-2220983-df','ujxnas-9800xdsd-d2',..., not auto-increasing sequence.

Possible duplicate of https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast — user9074332, Jan 04 '19 at 04:32
Does the table have a primary key that is an auto-increasing if? — ysth, Jan 04 '19 at 04:51
@user9074332 I know that answer but it doesn't suit the condition here. — CoffeeSun, Jan 04 '19 at 05:38
Is there a primary key at all? If not, seems like your best option would be to add one — ysth, Jan 04 '19 at 05:59
@ysth Assume the table has a primary key, then how could you do that selection? — CoffeeSun, Jan 04 '19 at 06:14
See the answers in the question linked above, and in the link in the comment on that question — ysth, Jan 04 '19 at 06:25

score 1 · Answer 1 · answered Jan 04 '19 at 12:32

This works under most circumstances:

select ext_id
from table_name t
where rand() < 2000 / 10000000
order by rand()
limit 1000;

The inner query selects approximately 2000 rows. There is some statistical variability. The outer query then orders these randomly and selects 1000 of them.

If you don't know the number of rows, you can do:

select t.ext_id
from table_name t cross join
     (select count(*) as cnt from t) tt
where rand() < 2000 / tt.cnt
order by rand()
limit 1000;

It does help and is probably the most efficient way for a table without `ID`-like column. — CoffeeSun, Jan 07 '19 at 06:32

score 0 · Answer 2 · answered Jan 04 '19 at 05:43

0

This Query Will help you.

SELECT name  FROM random AS r1
JOIN (SELECT CEIL(RAND() * (SELECT MAX(id) FROM random)) AS id) AS r2 
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 100

answered Jan 04 '19 at 05:43

JohnR

59
1
8

No id or similar column in the table – CoffeeSun Jan 04 '19 at 05:52

How to randomly select some, say 1000, values from a specific columns in a very large table in MySQL?

2 Answers2