0

i have a code where i can successfully execute a sql file and export the results to a excel file:

#CONFIG
DATE = datetime.datetime.now().strftime("%Y-%m-%d_%I-%M-%p")
CONFIG = f'./config'
DATA = f'./data'
TODAY = f'{DATA}/{DATE}'

#Server Config
SQLSERVER = cx_Oracle.makedsn('xxx.xxx.xxx.xxx', '1521', service_name='SERVER')
conn = cx_Oracle.connect(user='xxxxx', password='xxxxxx', dsn=SQLSERVER)

# Read SQL File
fd = open(f'{CONFIG}/1.sql', 'r')
Query1 = fd.read()
fd.close()

#Execute SQL query
dfs = []
for chunk in pd.read_sql_query(Query1, con=conn):
    dfs.append(chunk)
df = pd.concat(dfs)

#Write results to Excel file
df.to_excel(f'{TODAY}/Query1.xlsx',engine='xlsxwriter',index=False)

This is all working greate, as expected. But the problem is that i cannot make it in a loop, because in the config folder i have around 60 sql files. And every week i got more and more. How to put all this in a loop to automaticly read all of the sql files and write a separate excel for each sql query?

vali
  • 1
  • 1
  • Instead of reading single file, you could read all the files in a directory and then loop over them to generate desired results. You can use an index in the loop to use in the name of the output file to create unique files. Here how you can read all files in a directory using Python: https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory – hassansuhaib Mar 29 '23 at 11:37
  • I already try that and i'm getting an error: `pandas.io.sql.DatabaseError: Execution failed on sql '['./config/1.sql','........']: expecting string or bytes object – vali Mar 29 '23 at 12:00
  • Do some debugging and check what is being read / executed in each loop. – Christopher Jones Mar 30 '23 at 01:48

1 Answers1

0

You can solve this at the shell level, so you run the Python script multiple times.

In Bash:

for SQL_SCRIPT in scripts_dir/*; do
  python run_sql_scripts.py $SQL_SCRIPT
done

This will run:

python run_sql_scripts.py scripts_dir/1.sql
python run_sql_scripts.py scripts_dir/2.sql
python run_sql_scripts.py scripts_dir/foo.sql
python run_sql_scripts.py scripts_dir/bar.sql
...

Then make sure your Python script is dynamic so it can run once against a single SQL script as input. And it can use the name of the script to build the name of the output XLSX file.

import sys

assert sys.argv, 'Must provide SQL_SCRIPT as argument for the script'
sql_script = sys.argv[1]
# e.g. 'scripts_dir/1.sql'

# ...

# Using `with` context block to open and close the file for you. And `r` is implied so I left that out.
with open(sql_script) as fd:
    Query1 = fd.read()

# ...


name = sql_script.strip('.sql')
# e.g. 1

out_path = f'{TODAY}/{name}.xlsx'
# e.g. TODAY/1.xlsx

df.to_excel(out_path, engine='xlsxwriter',index=False)

Otherwise you can put all that logic in the Python script in a function. Then you can call get the list of files in the directory using Python instead of Bash and run that.

def sql_to_xlsx(path):
    # your logic here
    # ...


paths = glob.glob(f"{CONFIG}/*.sql")
# [ 'abc/1.sql', 'abc/2.sql' ]

for path in paths:
    sql_to_xlsx(path)

Reference from my notes on built-in module: https://michaelcurrin.github.io/dev-cheatsheets/cheatsheets/python/files-and-paths/glob.html

Michael Currin
  • 615
  • 5
  • 14
  • 2
    Although there may be valid cases, the overhead of running multiple scripts and therefore re-connecting in each can be a big problem. It's better to do the looping inside Python and reuse the connection. – Christopher Jones Mar 30 '23 at 01:46