I have a very large table in MySQL database, which has a columns names exa_id
and the number of rows of this table is more than 10,000,000. I want to randomly and efficiently select only 1000 of the data through pandas.read_sql
statement in Python. How can I write the code?
The SQL select ext_id from table_name order by rand() limit 1000
performs really bad, I'd like to another way.
One more explanation is that the contents of column exa_id
are strings, like 'uudjsx-2220983-df','ujxnas-9800xdsd-d2',..., not auto-increasing sequence.