0

What would be the best approach to use Python as an ETL tool when an SSIS package cannot be deployed? There are 8 SQL queries that are fairly complex which need to be executed daily and its outputs saved as csv files to a specific folder.

I found a couple of possible solutions here:

  1. Automatically scheduling SQL query results to be exported to a csv file

  2. How to automatically save sql query results to .csv file

  3. Connect to Linked Server via SQL Server pyodbc connection?

The first 2 solutions require either an SSIS package, SQL Agent Job, or the bcp utility. Ideally, I'd like to proceed with Python as in the third option since this would allow me to integrate the ETL code into my current workflow. This would also allow me to learn more about Python's ETL capabilities.

Right now, I have 8 .sql files that I execute every day and I'd like to proceed with PyODBC to establish the connection to the SQL database and its tables. Each query is about 300 lines of code and it selects data from the database as well as databases on a linked server.

This is what I have so far in Python:

import pyodbc
import pandas as pd
import os
import sys

# Establish connection to the SQL database and appropriate  table
conn = pyodbc.connect('Driver={'+driver+'};'
                      'Server='+server+';'
                      'Database='+database+';'
                      'Trusted_Connection=yes;')

path = 'C:\\User\\Me\\path\\'

# Define the query you'd like to execute:
query = open(path + 'Dynamic Query - Import Data 02_27_20.sql', 'r')

# For some reason this line of code times out and gives me an error message
df = pd.read_sql_query(query.read(), conn)

df.to_csv(output_folder + filename, index = False)

The error I have here is when I try to execute the .sql file into Python. After I run this line of code, the execution times out and I receive the following error: "TypeError: 'NoneType' object is not iterable."

If I'm able to execute the query in Python and have its results saved as a dataframe, I can then proceed to export this dataframe as a csv file to my desired location.

Any help would be greatly appreciated. Thanks!

JorgeSD
  • 1
  • 2
  • 1
    not sure if they'll work, so just take 'em as suggestions. I think you would be better served by using [pathlib](https://docs.python.org/3/library/pathlib.html) as u can mod ur path to be operating system independent. The module will take care of the underlying magic. for the query path that fails to open, kindly test it. what happens when you run query.read() by itself? do u get any output on the screen or terminal? is it possible to get that string into a variable, and then pass the variable into ur pandas read query? also, you could use a context manager 2 ensure file is closed wen u r done – sammywemmy Apr 01 '20 at 06:26
  • Are you sure that you are connecting to the database and have appropriate rights? Microsoft SQL Server is very picky. Are you sure that your query is valid SQL? Is it being read in correctly? Here is where using the REPL and manually running commands comes in handy or some strategic print statements can show you what is going on. Also if include the complete traceback error message you might get more help. As it stands we cannot tell of the query is None, the connection, the result from pandas or the dataframe. – WombatPM Apr 01 '20 at 06:44
  • Try prepending your SQL text with `SET NOCOUNT ON;` before executing it. – Gord Thompson Apr 01 '20 at 11:55
  • I am not a python expert by any means, but how about you test your query with something simple like "select getdate() as [today]" and see if that works. If that works then the problem lies in your query. – KeithL Apr 01 '20 at 15:32
  • Thank you all for the responses! It turns out that we went ahead with a different solution other than Python. – JorgeSD Apr 19 '20 at 00:57

0 Answers0