What would be the best approach to use Python as an ETL tool when an SSIS package cannot be deployed? There are 8 SQL queries that are fairly complex which need to be executed daily and its outputs saved as csv files to a specific folder.
I found a couple of possible solutions here:
The first 2 solutions require either an SSIS package, SQL Agent Job, or the bcp utility. Ideally, I'd like to proceed with Python as in the third option since this would allow me to integrate the ETL code into my current workflow. This would also allow me to learn more about Python's ETL capabilities.
Right now, I have 8 .sql files that I execute every day and I'd like to proceed with PyODBC to establish the connection to the SQL database and its tables. Each query is about 300 lines of code and it selects data from the database as well as databases on a linked server.
This is what I have so far in Python:
import pyodbc
import pandas as pd
import os
import sys
# Establish connection to the SQL database and appropriate table
conn = pyodbc.connect('Driver={'+driver+'};'
'Server='+server+';'
'Database='+database+';'
'Trusted_Connection=yes;')
path = 'C:\\User\\Me\\path\\'
# Define the query you'd like to execute:
query = open(path + 'Dynamic Query - Import Data 02_27_20.sql', 'r')
# For some reason this line of code times out and gives me an error message
df = pd.read_sql_query(query.read(), conn)
df.to_csv(output_folder + filename, index = False)
The error I have here is when I try to execute the .sql file into Python. After I run this line of code, the execution times out and I receive the following error: "TypeError: 'NoneType' object is not iterable."
If I'm able to execute the query in Python and have its results saved as a dataframe, I can then proceed to export this dataframe as a csv file to my desired location.
Any help would be greatly appreciated. Thanks!