0

i have been working with bulk inserts on our Azure SQL Pool(Formerly SQL DW) with this code:

def save_sql_bulk(self, table_name, schema, file_path):
        logging.info(f"Starting proccess to save data to table:{table_name}")

        logging.info(f"Connecting to sink database: {sy_server}")
        cnxn_sy = pyodbc.connect(sy_cnxn)



        logging.info(f"Appending to table: {table_name} with schema: {schema} from path: {file_path}")
        query = f'''

                COPY INTO {schema}.{table_name}

                FROM 'https://siidatalake.blob.core.windows.net/containername/{file_path}'

                WITH(

                    FIRSTROW = 2,

                    FILE_TYPE = 'CSV',

                    MAXERRORS = 100,

                    FIELDTERMINATOR = ';',

                    CREDENTIAL = (IDENTITY= 'MANAGED IDENTITY')

                )'''

        logging.info(f"Executing query: \n {query} \n")
        print(f"Executing query: \n {query} \n")
        cnxn_sy.execute(query)
        cnxn_sy.commit()
        cnxn_sy.close()

        logging.info(f"Done appending without problems.. ")

All in all, i have no problems whatsoever in most cases, but from time to time i encounter the following problem while executing such statement:

Exception while proccesing [file_path] with error [('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]111213;The current transaction has aborted, and any pending changes have been rolled back. Cause: 111208;An internal error occurred that prevents further processing of this command: 'Object reference not set to an instance of an object.'. (111213) (SQLExecDirectW)")]

Sometimes even happens with a proccess that was already working for a while, what could this mean? my table definitions have not changed and the proccess fails even with data that was already proccesed before, for more context im inserting a couple of millions of rows. I read in another thread that it could be database related but in my case my databases are maintained by Azure so i don't think is a problem that will be fixed with a database patch or reinstall, any ideas?

Thanks.

frammnm
  • 537
  • 1
  • 5
  • 17
  • Does this answer your question? [basic pyodbc bulk insert](https://stackoverflow.com/questions/37008848/basic-pyodbc-bulk-insert) – Ecstasy Apr 06 '22 at 04:39
  • You can refer to [BULK INSERT (Transact-SQL)](https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15), [How to speed up bulk insert to MS SQL Server using pyodbc](https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc) and [Speed up Bulk inserts to SQL db using Pandas and Python](https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990) – Ecstasy Apr 06 '22 at 04:41
  • Thanks, but i already checked those threads for reference before making this question, and in those threads they talk about how to make a bulk insert not why i would get this object reference error. – frammnm Apr 08 '22 at 06:34

1 Answers1

0

The solution was to use the autocommit option in the pyodbc connection. It seems the SQL Pool from Azure requires it to avoid this issue.

frammnm
  • 537
  • 1
  • 5
  • 17