MS SQL Server CSV import from PANDAS is slow. Any alternative?

Question

my code loads CSV into MS SQL Server via PYODBC. However the CSV after cleaning is 300MB + with 3 mil. rows. This gets painstakingly slow. This is my code. Do you know any way how to speed up the process? Thank you.

def impserv1251(self):
    try:  
        import pyodbc
        import os
        import sys
        import struct
        import fileinput
        import csv
        import pandas as pd
        serv = 'xxxxxxxxx\xxxxxxxx'
        datab = 'xxxxxxxxxxxxxxxx' 
        cnxn = pyodbc.connect('Trusted_Connection=yes', driver = '{ODBC Driver 13 for SQL Server}',server = serv, database = datab)
        print(cnxn)
        cwd = 'xxxxxxxxxxxxxxxxxxxxxxxxx'
        directory = (cwd + '\\FINAL\\CSV')
        for file in os.listdir(directory):
            if file.endswith( "1251_FINAL.csv"):
                data = pd.read_csv(directory + '\\' + "1251_FINAL.csv", sep=",", usecols=['SYS', 'MANDT', 'AGR_NAME', 'OBJECT', 'AUTH', 'FIELD', 'LOW', 'HIGH', 'DELETED'], encoding='latin-1', dtype={'SYS':str, 'MANDT':str, 'AGR_NAME':str, 'OBJECT':str, 'AUTH':str, 'FIELD':str, 'LOW':str, 'HIGH':str, 'DELETED':str})
                data = data.fillna(value='')       
        cursor = cnxn.cursor()
        for pos, row in data.iterrows():
                cursor.execute("INSERT INTO dbo.AGR_1251_ALL([SYS], [MANDT], [AGR_NAME], [OBJECT], [AUTH], [FIELD], [LOW], [HIGH], [DELETED]) values(?,?,?,?,?,?,?,?,?)", row['SYS'], row['MANDT'], row['AGR_NAME'], row['OBJECT'], row['AUTH'], row['FIELD'], row['LOW'], row['HIGH'], row['DELETED'])
        cnxn.commit()
        cursor.close()
        cnxn.close()

Have you tried batching your commands, then executing them all at once? Example: https://blogs.oracle.com/opal/efficient-and-scalable-batch-statement-execution-in-python-cx_oracle — pythomatic, Jul 18 '19 at 18:00

MS SQL Server CSV import from PANDAS is slow. Any alternative?

0 Answers0