0

I am trying to speed up loading a large CSV file into a MySQL database. Using this code it takes about 4 hours to load a 4GB file:

with open(source) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)
    insert_sql = """ INSERT INTO billing_info_test (InvoiceId, PayerAccountId, LinkedAccountId) VALUES (%s, %s, %s) """
    for row in csv_reader:
        cursor.execute(insert_sql,row)
        print(cursor.rowcount, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
    print("Committing the DB")
    mydb.commit(
cursor.close()
mydb.close()

I want to use the executemany() statement to make this faster. For that, you have to pass a list of tuples to the second argument.

If I build the list on each row iteration it gets too large, and I get out of memory errors when the list gets too large, and the script crashes.

I am not able to get a length of csv_reader or csv_file to use in a range statement.

How can I loop through the CSV file 1000 rows at a time and store the result in a list, use it in executemany, then store the next 1000 rows, etc until the end of the CSV file?

bluethundr
  • 1,005
  • 17
  • 68
  • 141

2 Answers2

1

If you need high speed inserts in mysql, you can try to use:

LOAD DATA LOCAL INFILE '/path/to/my_file.csv' INTO TABLE my_table;

heliosk
  • 1,113
  • 3
  • 23
  • 45
  • Not having any luck there either. I tried putting local-infile=1 in my.ini. Then I restarted the service. Now I'm getting this error: `ERROR 1148 (42000): The used command is not allowed with this MySQL version` I tried putting it under [mysql] and the [mysqld] section and restarting the service each time. – bluethundr May 06 '19 at 15:09
  • I think this post should help you a little: https://stackoverflow.com/questions/18437689/error-1148-the-used-command-is-not-allowed-with-this-mysql-version – heliosk May 06 '19 at 15:41
  • Thanks! That did help. I am loading the file now. However, I do still have to learn how to handle this programmatically. Hope I can figure it out! – bluethundr May 06 '19 at 15:56
  • That was SUPER fast! I was done in about 10 minutes. However all the rows are NULL when I do a select. So that didn't quite work. :/ – bluethundr May 06 '19 at 16:04
  • yep. its supposed to be this fast. try to check if all of your columns(mysql) match the columns of your .csv file. – heliosk May 06 '19 at 16:11
  • Ok, no they don't after the import. ALLof the fields are NULL. But when I load the data with the code snippet I posted in the OP, all of the columns are populated correctly. – bluethundr May 06 '19 at 16:30
  • For example if I examine the records in the database after loading the file directly into MySQL all the entries in the table look like this: https://pastebin.com/SiyPs8Yn – bluethundr May 06 '19 at 16:32
  • I'm trying again with this info: https://stackoverflow.com/questions/14127529/mysql-import-data-from-csv-using-load-data-infile – bluethundr May 06 '19 at 16:34
  • No luck that time either. MySQL reported success importing the file. However when I do selects from the table there is no data present: https://pastebin.com/7jLtx1kH – bluethundr May 06 '19 at 16:42
  • You can try to read the csv columns, one by one to check if the values are ok. Did you try to load by column? `LOAD DATA LOCAL INFILE csv_file INTO TABLE my_table COLUMNS TERMINATED BY "," LINES TERMINATED BY "\\r\\n" (COL1, COL2, COL3, COL4, COL5, ...)` – heliosk May 06 '19 at 17:19
0

A small hint:

In [1]: import itertools

In [2]: rows = iter(range(10))

In [3]: while True:
   ...:     batch = [*itertools.islice(rows, 3)]
   ...:     if not batch:
   ...:         break
   ...:     print(batch)
   ...:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

But I should agree with @heliosk that a better solution is to use LOAD DATA INFILE for large files. You may also need to disable keys until the import is finished.

newtover
  • 31,286
  • 11
  • 84
  • 89