4

I'm inserting data from one database to another, so I have 2 connections (Conn1 and Conn2). Below is the code (using pypyodbc).

 import pypyodbc

 Conn1_Query = "SELECT column FROM Table"

 Conn1_Cursor.execute(Conn1_Query)
 Conn1_Data = Conn1_Cursor.fetchall()

 Conn1_array = []

 for row in Conn1_Data:
      Conn1_array.append(row)

The above part runs very quickly.

 stmt = "INSERT INTO TABLE(column) values (?)"
 Conn2_Cursor.executemany(stmt, Conn1_array)
 Conn2.commit()

This part is extremely slow. I've also tried to do a for loop to insert each row at a time using cursor.execute, but that is also very slow. What am I doing wrong and is there anything I can do to speed it up? Thanks for taking a look.

Thought I should also add that the Conn1 data is only ~50k rows. I also have some more setup code at the beginning that I didn't include because it's not pertinent to the question. It takes about 15 minutes to insert. As a comparison, it takes about 25 seconds to write the output to a csv file.

Epausti
  • 53
  • 1
  • 8

1 Answers1

1

Yes, executemany under pypyodbc sends separate INSERT statements for each row. It acts just the same as making individual execute calls in a loop. Given that pypyodbc is no longer under active development, that is unlikely to change.

However, if you are using a compatible driver like "ODBC Driver xx for SQL Server" and you switch to pyodbc then you can use its fast_executemany option to speed up the inserts significantly. See this answer for more details.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418