0

I have a few SQL Queries stored in an Excel File.

The original SQL Scripts

I would like to run them on the given SQL database and then store results from the SQL query as well as the original data frame in a separate data frame.

from sqlalchemy import create_engine
import pymssql
engine = create_engine('connection string')

First I tried just storing results in a temp data frame which works fine. Each of these SQL queries give 3-4 rows of output.

df_result = pd.DataFrame()
for row in df.itertuples(): 
    df_temp = pd.read_sql(row.SQL_Query, engine) 
    df_result = df_result.append(df_temp)

The results dataframe

But the goal is store the result along with its associated query/row. The below is obviously wrong because the loop stores results from only last 3 times

df_result = pd.DataFrame()
for row in df.itertuples(): 
    df_temp = pd.read_sql(row.SQL_Query, engine) 
    df_result = pd.merge(df, df_temp, left_index=True, right_index=True)

Incorrect Merge Results

Dhruv
  • 87
  • 1
  • 9
  • [Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451) – Parfait May 07 '19 at 21:09

1 Answers1

0

Consider building a list of data frames with corresponding ID (assumed to be unique for each row). Then concat for single output and finally merge with original data frame on ID.

df_list = [pd.read_sql(row.SQL_Query, engine).assign(ID=row.ID) for row in df.itertuples()]

sql_df = pd.concat(df_list)

df_result = pd.merge(df, sql_df, on="ID")
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I understand the code in line 1 but can you explain some of the thinking behind it? Might be useful. Like why did you create a list? Also is having the for loop written at the end different than having it at beginning? – Dhruv May 07 '19 at 21:27
  • Sorry didn't read the comment before. Yes, it worked perfectly. – Dhruv May 07 '19 at 21:28
  • First line is a [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#tut-listcomps) building a list of data frames which is the preferred way than quadratic copying. Notice `concat` only runs once outside of any loop. – Parfait May 07 '19 at 21:33