I am trying to speed up loading a large CSV file into a MySQL database. Using this code it takes about 4 hours to load a 4GB file:
with open(source) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)
insert_sql = """ INSERT INTO billing_info_test (InvoiceId, PayerAccountId, LinkedAccountId) VALUES (%s, %s, %s) """
for row in csv_reader:
cursor.execute(insert_sql,row)
print(cursor.rowcount, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
print("Committing the DB")
mydb.commit(
cursor.close()
mydb.close()
I want to use the executemany() statement to make this faster. For that, you have to pass a list of tuples to the second argument.
If I build the list on each row iteration it gets too large, and I get out of memory errors when the list gets too large, and the script crashes.
I am not able to get a length of csv_reader or csv_file to use in a range statement.
How can I loop through the CSV file 1000 rows at a time and store the result in a list, use it in executemany, then store the next 1000 rows, etc until the end of the CSV file?