I've written some code to run all the .sql files in a given directory and write the output of each to csv. However, I'm running into an issue with a very large query that eats up all my RAM and then crashes the process. Unfortunately, the query output is going to be that large, so the only solution I can see is to write the output directly to file.
The script in question:
import pandas as pd
from sqlalchemy.engine import URL
from sqlalchemy import create_engine
serverName = 'FOO'
databaseName = 'BAR'
connection_string = ("Driver={SQL Server};"
"Server=" + serverName + ";"
"Database=" + databaseName + ";"
"Trusted_Connection=yes;")
connection_url = URL.create(
"mssql+pyodbc", query={"odbc_connect":connection_string })
engine = create_engine(connection_url)
sqlFile = "select 'Hello world' as 'Header'"
# Run the SQL (sqlFile) on the connection (engine)
pd.read_sql(sqlFile, engine).to_csv('foo.csv', header=True, index=False, sep='|')
I had previously read the sql into a variable (data frame) and then written that out to csv.
data = pd.read_sql(sqlFile, engine)
data.to_csv(file_name)
I was hoping that by changing to the following, the output would write directly to file, as I'd removed the "store output as variable data" step.
pd.read_sql(sqlFile, engine).to_csv(file_name)
How can I have the output of a sql query written directly to a file, row by row, without storing the entire output in memory prior to writing the file?
(Edited to minimally runnable code)