6

I have following method that access mysql database and the query get executed in a server that I don't have any access to change anything on regarding increasing memory. I am new to generators and started to read more about it and thought I could convert this to be use generator.

def getUNames(self):
    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    global_user_list = []
    try:
        self.gdbCursor.execute(globalUserQuery)
        rows = self.gdbCursor.fetchall()
        for row in rows:
            uName = unicode(row['gu_name'], 'utf-8')
            global_user_list.append(uName)
        return global_user_list
    except Exception, e:
        traceback.print_exc()

And I use this code as follow:

for user_name in getUNames():
...

This is the error that I was getting from server side:

^GOut of memory (Needed 725528 bytes)
Traceback (most recent call last):
...
packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (2008, 'MySQL client ran out of memory')

How should I be using generator to avoid this:

while true:
   self.gdbCursor.execute(globalUserQuery)
   row = self.gdbCursor.fetchone()
   if row is None: break
   yield row

Not sure if the above is the right way to go since I am expecting a list as a result of my database method. I think what would be great is a get chunk from the query and return a list and once that get done generator would give the next set as long as query return results.

add-semi-colons
  • 18,094
  • 55
  • 145
  • 232

1 Answers1

13

With MySQLdb, the default cursor loads the entire result set into a Python list when the call to cursor.execute(..) is made. For a large query that may cause a MemoryError whether or not you use a generator.

Instead, use an SSCursor or SSDictCursor. These will keep the result set on the server side, and allow you to interate through the items in the result set on the client side:

import MySQLdb  
import MySQLdb.cursors as cursors
import traceback

def getUNames(self):
    # You may of course want to define `self.gdbCursor` somewhere else...
    conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)
    #                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    #                       Set the cursor class to SSCursor here
    self.gdbCursor = conn.cursor()

    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    try:
        self.gdbCursor.execute(globalUserQuery)
        for row in self.gdbCursor:
            uName = unicode(row['gu_name'], 'utf-8')
            yield uName
    except Exception as e:
        traceback.print_exc()

There isn't much documentation on the difference between the default Cursor and the SSCursor. The best source I know is the docstrings of the Cursor Mixin classes themselves:

The default cursor uses a CursorStoreResultMixIn:

In [2]: import MySQLdb.cursors as cursors
In [8]: print(cursors.CursorStoreResultMixIn.__doc__)
This is a MixIn class which causes the entire result set to be
    stored on the client side, i.e. it uses mysql_store_result(). If the
    result set can be very large, consider adding a LIMIT clause to your
    query, or using CursorUseResultMixIn instead.

and the SSCursor uses a CursorUseResultMixIn:

In [9]: print(cursors.CursorUseResultMixIn.__doc__)
This is a MixIn class which causes the result set to be stored
    in the server and sent row-by-row to client side, i.e. it uses
    mysql_use_result(). You MUST retrieve the entire result set and
    close() the cursor before additional queries can be peformed on
    the connection.

Since I changed getUNames into a generator, it would be used like this:

for row in self.getUnames():
    ...
Danish Absar
  • 356
  • 4
  • 12
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    A very good answer on a little known topic. I took the liberty to edit your code sample to put emphasis on the part where the cursorclass is set (to be honest, I spend myself few *minutes* (!) trying to find where to set the cursor type o_O) – Sylvain Leroux Aug 12 '13 at 21:46
  • @Sylvian Leroux: merci beaucoup. – unutbu Aug 12 '13 at 21:58
  • @unutbu how do you set rows here ` self.gdbCursor.execute(globalUserQuery) for row in rows:` – add-semi-colons Aug 13 '13 at 14:27
  • 1
    Sorry; that was an error on my part. It should have been `for row in self.gdbCursor`. The cursor is an iterator! – unutbu Aug 13 '13 at 14:36
  • I see thats probably why i still end up getting memory issue since I left fetchall – add-semi-colons Aug 13 '13 at 14:38
  • @unutbu I see a strange behavior after switching to generator. My data set is drastically drop down. Early my code failed but at that time it had spit 8500 records but now it only spitting out 900 records nothing other than generator has change. – add-semi-colons Aug 15 '13 at 00:42
  • Sorry, I can't think of any reason why that should be. – unutbu Aug 15 '13 at 11:34
  • Very interesting. What if I want to load my results in a [set](https://docs.python.org/2/library/stdtypes.html#set)? I'm forced to go thorugh the cursor with a loop and add them to the set myself, right? Sure, direct conversion of the cursor to a set simply by `set()` is only sweet sugar. – Nemo Jun 02 '16 at 11:16
  • Very useful answer ! May I suggest that in Python3 MysqlDb no longer exists.? It seems pymysql is the equivalent with the sscursor option available. – c24b Jul 14 '20 at 20:52