I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas
library and chunks
. See the code below:
import pandas as pd
import numpy as np
import pymysql.cursors
connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx')
try:
with connection.cursor() as cursor:
query = "SELECT * FROM example_table;"
chunks=[]
for chunk in pd.read_sql(query, connection, chunksize = 1000):
chunks.append(chunk)
#print(len(chunks))
result = pd.concat(chunks, ignore_index=True)
#print(type(result))
#print(result)
finally:
print("Done!")
connection.close()
Actually the execution time is acceptable if I limit the number of rows to select. But if want to select also just a minimum of data (for example 1 mln of rows) then the execution time dramatically increases.
Maybe is there a better/faster way to select the data from a relational database within python?