I'm trying to write a dynamic sql-export function for pandas dataframes. So far I'm reading the dataframes column dtypes and to write the "create table" query. Since T-SQL hasn't the CREATE TABLE IF NOT EXISTS
functionality I had to write this workaround.
The code looks like this, where data_df
is a non-empty pandas DataFrame:
table_columns = data_df.columns
target_table = 'Some table name'
col_dtype = []
data_coldtypes = data_df.dtypes.apply(lambda x: x.name).to_dict()
for k, v in data_coldtypes.items():
if 'int' in v.lower():
col_dtype.append(k + ' bigint')
elif 'float' in v.lower():
col_dtype.append(k + ' float')
elif 'datetime' in v.lower():
col_dtype.append(k + ' datetime')
else:
col_dtype.append(k + ' varchar(max)')
create_query = f"""IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='{target_table}' AND xtype='U')
CREATE TABLE [dbo].[{target_table}] ({', '.join(col_dtype)});"
Now I want to also directly fill the newly created table with the respective data of this dataframe, which is why I also create a INSERT
query, looking like this:
insert_query = f"INSERT INTO [dbo].[{target_table}] " \
f"([{'], ['.join(table_columns)}]) VALUES "\
f"({', '.join(['?'] * len(table_columns))});"
# Transforming the dataframe to a tuple-dataset
dataset_tuple = [tuple(None if (pd.isna(value))
else value for value in row) for row in data.to_numpy()]
Each statement works on it's own, but now comes the weird part which I don't understand. When I try to execute both statements with fast_executemany = True
the insert statement will be skipped. The code looks like this, where get_connection
is just a helper function to return the connection for the respective server:
with get_connection(db_conn_url, 'mssql') as conn:
with conn.cursor() as cursor:
cursor.fast_executemany = True
cursor.execute(create_query)
cursor.executemany(insert_query, (dataset_tuple))
print('TEST: inserted')
With this code the table is created successfully but not filled (the print statement at the end will not be called), but there is no error message. Commenting out either the cursor.fast_executemany
option or cursor.execute(create_query)
command makes the insert statement work, so I guess it has something to do with fast_exectuemany
option. Did anybody encounter a similar problem or knows what causes this behaviour? I'm happy for every hint or a different code logic to create a table if not existent and directly fill it based on the dataframe information. Thanks!
Other useful code:
def get_connection(db_url, sql_type):
conn = None
try:
if sql_type == 'mssql':
conn = pyodbc.connect(db_url)
yield conn
finally:
conn.commit()
conn.close()