0

I'm trying to write a python script that parses through a file and updates a database with the new values obtained from the parsed file. My code looks like this:

startTime = datetime.now()

db = <Get DB Handle>

counter = 0

with open('CSV_FILE.csv') as csv_file:
    data = csv_file.read().splitlines()
    for line in data:
        data1 = line.split(',')
        execute_string = "update table1 set col1=" + data1[1] + 
                         " where col0 is '" + data1[0] + "'"
        db.execute(execute_string)
        counter = counter+1
        if(counter % 1000 == 0 and counter != 0):
            print ".",

print ""

print datetime.now() - startTime

But that operation took about 10 mins to finish. Any way I can tweak my SQL query to quicken it?

user3666471
  • 907
  • 11
  • 24
  • how big is your file? the speed problem could come from the " where col0 is '" + data1[0] + "'" line. As each time database has to find what to update. Alternative approach can be to import this file into the db first and do bulk update. – Bulat Jul 29 '15 at 16:50
  • It should be faster to load the CSV file directly into a database table and then do the update based on the table. In other words, do the hard lifting work in the DB rather than in Python. – Gordon Linoff Jul 29 '15 at 16:55

2 Answers2

0

Read lines in batch size (may be size of 1000) and try bulk_update query for mysql. I think that process will be faster than current one as in lesser queries you'll be updating more data.

NIlesh Sharma
  • 5,445
  • 6
  • 36
  • 53
0

I agree you need batching. Have a look at this question and best answer How can I do a batch insert into an Oracle database using Python?

Community
  • 1
  • 1