I have a CSV input file with aprox. 4 million records. The insert is running since +2hours and still has not finished. The Database is still empty.
Any suggestions on how to to actually insert the values (using insert into
) and faster, like breaking the insert in chunks?
I'm pretty new to python.
- csv file example
43293,cancelled,1,0.0,
1049007,cancelled,1,0.0,
438255,live,1,0.0,classA
1007255,xpto,1,0.0,
- python script
def csv_to_DB(xing_csv_input, db_opts):
print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
conn = pymysql.connect(**db_opts)
cur = conn.cursor()
try:
with open(xing_csv_input, newline='') as csvfile:
csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csv_data:
insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
cur.execute(insert_str, row)
conn.commit()
finally:
conn.close()
UPDATE: Thanks for all the inputs. As suggested, I tried a counter to insert in batches of 100 and a smaller csv data set (1000 lines). The problem now is only 100 records are inserted, although the counter passes 10 x 100 several times.
code change:
def csv_to_DB(xing_csv_input, db_opts):
print("Inserting csv file {} to database {}".format(xing_csv_input, db_opts['host']))
conn = pymysql.connect(**db_opts)
cur = conn.cursor()
count = 0
try:
with open(xing_csv_input, newline='') as csvfile:
csv_data = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csv_data:
count += 1
print(count)
insert_str = "INSERT INTO table_x (ID, desc, desc_version, val, class) VALUES (%s, %s, %s, %s, %s)"
if count >= 100:
cur.execute(insert_str, row)
print("count100")
conn.commit()
count = 0
if not row:
cur.execute(insert_str, row)
conn.commit()
finally:
conn.close()