2

I am using python 2.7, pyodbc and mysql 5.5. I am on windows

I have query which returns millions of rows and I would like to process it in chunks. using the fetchmany function.

He a portion of the code

import pyodbc
connection = pyodbc.connect('Driver={MySQL ODBC 5.1 Driver};Server=127.0.0.1;Port=3306;Database=XXXX;User=root; Password='';Option=3;')

cursor_1 = connection.cursor()
strSQLStatement = 'SELECT x1, x2 from X'

cursor_1.execute(strSQLStatement)
# the error occurs here  

x1 = cursor_1.fetchmany(10)
print x1
connection.close()

My problem:

  1. I get the error MySQL client ran out of memory

  2. I guess that this is because the cursor_1.execute tries to read everything into memory and tried the following (one by one) but to no avail

    1. In user interface (ODBC – admin tools) I ticked the “Don't cache results of forwarding-only cursors”
    2. connection.query("SET GLOBAL query_cache_size = 40000")

My question:

  1. Does pyodbc has the possibility to run the query and serve the results only on demand ?

  2. The MySQL manual suggests to invoke mysql with the --quick option. Can this be done also when not using the command line?

Thanks for your help.

P.S: suggestions for an alternative MySQL module are also welcome, but I use portable python so my choice is limited.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user1043144
  • 2,680
  • 5
  • 29
  • 45

2 Answers2

0

Use the LIMIT clause on the query string.

http://dev.mysql.com/doc/refman/5.5/en/select.html

By using

SELECT x1, x2 from X LIMIT 0,1000 

You'll only get the 1st 1k records, then by doing :

SELECT x1, x2 from X LIMIT 1000,2000 

You'd get the next 1k records.

Loop this appropriately to get all your records. (I dont know python so cant help here :( )

FreudianSlip
  • 2,870
  • 25
  • 24
  • Thanks. This would be an option of last resort (the nuclear option so to speak). – user1043144 Aug 23 '12 at 08:02
  • Just came by this comment and would like to expand on the comment above. This is indeed the **nuclear option** b/c you're not telling mysql in which way to order them. If something was inserted between the two queries you'd get non deterministic results b/c of the way mysql returns rows with no order by. And order by limit is very inefficient, if it has not been fixed. a better bet would be `BETWEEN N AND M` and keep track of N and M yourself. – ekydfejj Jan 09 '19 at 21:05
0

Using MySQLdb with SSCursor will solve your issues.

Unfortunately the documentation isn't great but it is mentioned in the user guide and you can find an example in this stackoverflow question.

Community
  • 1
  • 1
Mihai Stan
  • 1,052
  • 6
  • 7
  • Thanks Mihai. I would very much like to do this using pyodbc or another DB API which works unproblematically on windows (on a portable version I use). This is not the case for MySQLdb windows. But thanks anyway. – user1043144 Aug 23 '12 at 08:00
  • did you try PyMySQL (pure python client) ? it also supports SSCursors, but I have no personal experience with it – Mihai Stan Aug 23 '12 at 08:08
  • I finally installed ActivePython and MySQLdb but it does not solve the problem. There is still a memory problem. In fact the documentation says that SSCursor saves results the server (I guess it fetches everything). I also tried to set the arraysize cursor_1.arraysize= 2 but it does not help. P.S: I did not know that ActivePython is actually portable (there is a zip version without installer). MySQLdb is called there MySQL-python . another related question 337479 I will use sqlite – user1043144 Aug 23 '12 at 13:59