0

I'm trying to write a dynamic sql-export function for pandas dataframes. So far I'm reading the dataframes column dtypes and to write the "create table" query. Since T-SQL hasn't the CREATE TABLE IF NOT EXISTS functionality I had to write this workaround. The code looks like this, where data_df is a non-empty pandas DataFrame:

table_columns = data_df.columns
target_table = 'Some table name'             
col_dtype = []
data_coldtypes = data_df.dtypes.apply(lambda x: x.name).to_dict()

for k, v in data_coldtypes.items():
    if 'int' in v.lower():
        col_dtype.append(k + ' bigint')
    elif 'float' in v.lower():
        col_dtype.append(k + ' float')
    elif 'datetime' in v.lower():
        col_dtype.append(k + ' datetime')
    else:
        col_dtype.append(k + ' varchar(max)')

create_query = f"""IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='{target_table}' AND xtype='U')
                    CREATE TABLE [dbo].[{target_table}] ({', '.join(col_dtype)});"

Now I want to also directly fill the newly created table with the respective data of this dataframe, which is why I also create a INSERT query, looking like this:

insert_query = f"INSERT INTO [dbo].[{target_table}] " \
               f"([{'], ['.join(table_columns)}]) VALUES "\
               f"({', '.join(['?'] * len(table_columns))});"

# Transforming the dataframe to a tuple-dataset

dataset_tuple = [tuple(None if (pd.isna(value))
                 else value for value in row) for row in data.to_numpy()]

Each statement works on it's own, but now comes the weird part which I don't understand. When I try to execute both statements with fast_executemany = True the insert statement will be skipped. The code looks like this, where get_connection is just a helper function to return the connection for the respective server:

with get_connection(db_conn_url, 'mssql') as conn:
    with conn.cursor() as cursor:

        cursor.fast_executemany = True
        cursor.execute(create_query)
        cursor.executemany(insert_query, (dataset_tuple))
        print('TEST: inserted')

With this code the table is created successfully but not filled (the print statement at the end will not be called), but there is no error message. Commenting out either the cursor.fast_executemany option or cursor.execute(create_query) command makes the insert statement work, so I guess it has something to do with fast_exectuemany option. Did anybody encounter a similar problem or knows what causes this behaviour? I'm happy for every hint or a different code logic to create a table if not existent and directly fill it based on the dataframe information. Thanks!

Other useful code:

def get_connection(db_url, sql_type):
     conn = None
     try:
         if sql_type == 'mssql':
             conn = pyodbc.connect(db_url)
             yield conn
      finally:
         conn.commit()
         conn.close()
Corralien
  • 109,409
  • 8
  • 28
  • 52
codade
  • 23
  • 6
  • How do you connect to your database? (what is the code behind `get_connection()`?) Do you enable auto commit with `autocommit=True`? – Corralien Apr 19 '23 at 09:00
  • Try with `conn = pyodbc.connect(db_url, autocommit=True)` – Corralien Apr 19 '23 at 09:14
  • Thanks, but no changes. I also already tried a `conn.commit()` in the with-statement after executing the `create_query`, but no success. Also starting a new with-statement with a new connection for the insert-query doesn't help. – codade Apr 19 '23 at 09:21
  • Ok tried it again and now it seems like the `cursor.fast_executemany = True` is not working anymore, no matter if with or without create_query. With `cursor.fast_executemany` turned off there's no problem. My pyodbc version is 4.0.39 – codade Apr 19 '23 at 09:37
  • Check this Q&A: https://stackoverflow.com/q/48006551/15239951. Does it solve your problem? – Corralien Apr 19 '23 at 09:52
  • You mean switching to turbodbc? I have to try it out, but it's still weird that suddenly the `fast_executemany` option is not working anymore. – codade Apr 19 '23 at 10:12
  • Turbodbc worked fine, although installation is quite painful on Windows. Thanks for your help! – codade May 09 '23 at 10:01
  • (Off Topic) Everything is quite painful on Windows :) Glad you solved your problem! – Corralien May 09 '23 at 10:19
  • I totally agree ;) – codade May 16 '23 at 17:23

1 Answers1

0

After some more tests I figured that the driver was the problem. Using ODBC Driver 17 for SQL Server in the db_url string solved it. But turbodbc stays a good alternative

codade
  • 23
  • 6