I need to load a table from SQL server. It's the first time that I'm doing that, which is causing my uncertainty and missing expertise in this domain.
I need the data of the whole table which has 64 million rows. To write it into a pandas dataframe I tried the following code and also tried SQLalchemy
and some chunksize
examples which failed to work at all. Maybe dask is more efficient than pandas, however I have never used dask before.
The code works up to TOP 1000000
, which takes 7.5 minutes. However larger queries are not finishing successfully.
Maybe someone can help me figuring out the 'beste practice' to do this.
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=SERVER\DEV;'
'Database=Clusteranalysis;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
query = 'SELECT * FROM [Clusteranalysis].[dbo].[Data] WHERE Rc=0'
result_port_map = pd.read_sql(query, conn)
print(result_port_map.head(5))
Thank you for your help!