I currently use pyodbc to read in as pandas dataframe, then I convert it to dask dataframe. Is there anyway to read in as Dask dataframe directly?
Here's the code I'm currently using:
import pandas as pd
import numpy as np
from dask.dataframe import from_pandas
def conn_sql_server(file_path):
#Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= Server1;'
'Database = Database1;'
'Trusted_Connection=yes;')
#run query and ouput the result to df
query = open(file_path, 'r')
df = pd.read_sql_query(query.read(), conn, chunksize=10**4)
chunks =[]
for chunk in df:
chunks.append(chunk)
df_comb = pd.concat(chunks)
query.close()
return df_comb
#load in as pandas dataframe
data = conn_sql_server('.\input\data pull.sql')
#Convert to dask dataframe
dd = from_pandas(data, npartitions=3)
I tried to use dd.read_sql_query
with pyodbc package or sqlalchemy package. Both returned an AttributeError: module 'dask.dataframe' has no attribute 'read_sql_query'
(1) pyodbc:
import dask.dataframe as dd
def conn_sql_server(file_path):
#Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= Server1;'
'Database = Database1;'
'Trusted_Connection=yes;')
#run query and ouput the result to df
query = open(file_path, 'r')
df = dd.read_sql_query(query.read(), conn)
query.close()
return df
data = conn_sql_server('.\input\data pull.sql')
AttributeError: module 'dask.dataframe' has no attribute 'read_sql_query'
(2) sqlalchemy:
from sqlalchemy import create_engine
Server= 'Server1'
Database = 'Database1'
Driver= 'SQL Server Native Client 11.0'
uri = f'mssql://{Server}/{Database}?driver={Driver}'
query = open('.\input\data pull.sql', 'r')
dd.read_sql_query((query.read(), uri))
As Nick suggested, I've upgrade dask to the latest version using python -m pip install dask distributed --upgrade
. I also checked all of the functions listed in the dask.dataframe module using the following script. Found out that there is only read_sql_table, no read_sql_query
from inspect import getmembers, isfunction
import dask.dataframe as dd
getmembers(dd, isfunction)