I believe sql server is multithreaded and can respond to multiple simultaneous requests. As such I think you should take advantage of multiprocessing as such:
import multiprocessing as mp
use_threads=mp.cpu_count()-1
batch_size=50000
conn = pyodbc.connect(sql, conn_str)
query='''your query...
OFFSET {} ROWS
FETCH NEXT {} ROWS ONLY;'''
tot_rows=pd.read_sql("select count(*) as tot_rows from your_table").iloc[0,0]
def read_sql(batch):
df = pd.read_sql(query.format(batch*batch_size, (batch+1)*batch_size), conn)
with mp.Pool(use_threads) as pool:
dfs=pool.map(read_sql, range(tot_rows//batch_size+1))
The idea is to batch the query using offset and fetch in sql server and have multiple processes on your machine request for these batches simultaneously
you could then concat into one table and write the csv
pd.concat(dfs).to_csv("sample.csv")
or exploit mp again
def to_csv(input):
df, name=input
df.to_csv(name)
with mp.Pool(use_threads) as pool:
pool.map(to_csv, [(df, f'sample_{i}') for i,df in enumerate(dfs)])