6

the popular python mysql module "MySQLdb" seems to have a memory leak issue. here's the code:

conn = MySQLdb.connect(...)
cursor = conn.cursor(cursorclass = MySQLdb.cursors.DictCursor)
sql = "select * from `test`"
cursor.execute(sql)  #leak start
cursor.close()
conn.close()
time.sleep(20)

Assume the test is a table with billion records. I ran the python code and execute

ps aux | awk '{print $2, $4, $11}' | grep python

at the same time, the results is that the memory use increased to 47.0% and never go back, even I close the cursor and the conn. Any ideas?

Chenglu
  • 1,757
  • 2
  • 14
  • 23
  • 1
    http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm – unutbu Mar 26 '14 at 10:31
  • I had the same problem. It has persisted for years without finding a solution. Instead, I just wrote a cronjob that reboots the server when the memory gets high. It's a hack, but it works. – Marc Maxmeister Mar 17 '15 at 02:20

1 Answers1

10

In this post, Fredrik Lundh explains why memory may not get returned to the system, even though it is not a memory leak. Near the bottom he explains why (in Python2) range(50*1024*100) may consume a lot of memory that will not get freed, even after deleting the list. And he mentions that using xrange is a way to avoid the memory problem in the first place.

Similarly, using an SSDictCursor instead of a DictCursor may be a way to avoid the memory problem in your case. An SSDictCursor causes the MySQL server to retain the result set on the server side, and the cursor will fetch rows from the result set one-at-a-time only as needed:

import MySQLdb
import MySQLdb.cursors as cursors
conn = MySQLdb.connect(..., cursorclass=cursors.SSDictCursor) #1
cursor = conn.cursor()
cursor.execute('select * from test')  #2
for row in cursor:                    #3
    print(row)
conn.close()
  1. Note the cursorclass=cursors.SSDictCursor in the call to connect.
  2. With a DictCursor (or any non-SS cursor), this call to execute would cause MySQLdb to load the entire result set into a Python object (such as a list of dicts).
  3. With an SSDictCursor, MySQLdb retrieves the rows one at a time.

So this will avoid the memory build-up problem, provided you don't need to hold the entire result set in Python all at once.

Note also that when using an SSCursor or SSDictCursor, "No new queries can be issued on the connection until the entire result set has been fetched." Cursors from two different connections can be used concurrently. That may not be an issue for you, but it's something to be aware of.

You might also want to check out oursql, an alternative DB adapter for MySQL. oursql cursors are server-side cursors which fetch lazily by default.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Wait a minute, is there a way to release the memory for the system by hand in the runtime? – Chenglu Mar 31 '14 at 07:55
  • Unfortunately, [the only reliable way to free memory is to terminate the process](http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python/1316799#1316799). – unutbu Mar 31 '14 at 10:58
  • It's not impossible to release memory, you just need to terminate the process. That means you can run a memory-hungry function in its own process (using the multiprocessing module) and release the memory by causing that function to end. – unutbu Apr 03 '14 at 11:43