0

I'm using Python with MySQLdb to perform some calculations on a large 75M row database. The basic program flow is to iterate through every row, process the data, then use another cursor to put new data back in the row. The processing is too complex for pure SQL.

db1 = MySQLdb.connect(host=HOST, user=USERNAME, passwd=PASSWORD, db=DATABASE)
db1.ping(True)
cursor1 = db1.cursor()
db = MySQLdb.connect(host=HOST, user=USERNAME, passwd=PASSWORD, db=DATABASE)
db2.ping(True)
cursor2 = db2.cursor()

cursor1.execute("SELECT * FROM table")
for row in cursor1:
    cursor2.execute("SELECT * FROM table")
    new_data    = process_stuff(row)

    cursor2.execute("UPDATE table SET column = '%f'", (new_data))

db2.commit()
db1.close()
db2.close()

This works, usually. Very often though, it silently quits with no error message. Sometimes, but not always, it will give a "MySQL has gone away" error.

It's also rather slow. I know it's a fairly large database, but it takes about 24 hours to process everything. It's a decent machine with 24 GB of RAM, but should it really take this long? I've spent hours fiddling with the my.conf, to no real gain.

Rachie
  • 433
  • 1
  • 6
  • 17
  • By looping through the database in a nested loop, the time complexity is [math]O(n^2)[/math] at best. This will take a really long time when the database is "fairly large" – msw Aug 30 '15 at 20:25
  • This could explain the speed, but not the silent crashing. Why wouldn't it throw an error? – Rachie Aug 31 '15 at 01:52
  • The inner loop is executed 5x10^15 times. For comparison, that 24GB memory you've got is only about 2x10^10B. I suspect you are running Out Of Memory somewhere and the kernel OOM mechanism is killing the database. See http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer for how to see if that happened. – msw Aug 31 '15 at 02:08
  • Why would it be using so much memory? Yes it's a big task, but it only has to process one line at a time. Doesn't it release the memory as it goes along? – Rachie Aug 31 '15 at 03:07
  • I don't know MySQL internals, but you do 5×10¹⁵ UPDATEs before committing, and yes, much of that goes into a journal on disk, but how much memory does that use? Dunno. Given the order of magnitude and thee number of code layers that has to be traversed, a one byte leak per update will eat your machine. I've given you a diagnostic for what I believe the most likely cause, and you wish to debate whether or not it ought crash. Well, it is crashing according to your claim so whether it should or shouldn't, it is. – msw Aug 31 '15 at 04:31

0 Answers0