After looking, it looks like this is a normal behaviour with MySQL. From the various sources, it looks like most of the select work is done in the execute phase for MySQL, and during the fetch, only network transfer takes place. I had spent so much time with Oracle (where execute usually does almost nothing in practice and the meat of the processing takes place at fetch time) that I didn't realize MySQL could be behaving differently.
Depending on the context, a work-around to be able to iterate without latency through items can be to implement a paging system. It could be done by encapsulating smaller fetchs in a Python generator. On the other hand, we are loosing consistency of data across calls, but this is acceptable in my case. Here is a basis for those who are interested in this approach. The adaptation needed to fetch the next page makes the SQL query somehow complex messy and less maintainable, and can bind your code to your primary key structure than what you'd like to, so you probably need to weigh the pros and cons before going for this. One good news is that this complexity can be hidden behind a generator.
import MySQLdb
database = MySQLdb.connect(passwd="x", host="dbserver", user="user", db="database", port=9999)
def get_next_item(database): #Definition of this generator encapsulating the paging system
first_call = True
mysql_cursor = database.cursor()
nothing_more_found = False
while not nothing_more_found:
mysql_query = """select a, b, c, d, e, f, g from mytable use index (primary)
where a = %s order by a, b, c, d
limit 10000""" if first_call else """select a, b, c, d, e, f, g from mytable use index (primary)
where a = %s and ((b > %s) or (b = %s and c > %s) or (b = %s and c = %s and d > %s))
order by a, b, c, d
limit 10000"""
if first_call:
mysql_cursor.execute(mysql_query, ["AA", last_b, last_b, last_c, last_b, last_c, last_d])
first_call = False
else:
mysql_cursor.execute(mysql_query, ["AA"])
if mysql_cursor.rowcount == 0:
nothing_more_found = True
for a, b, c, d, e, f, g in mysql_cursor:
yield (a, b, c, d, e, f, g)
last_b, last_c, last_d = b, c, d
for a, b, c, d, e, f, g in get_next_item(database): #Usage of the generator
#Do something
Explanation on MySQL execute vs. fetch in this post from Mike Lischke.
The fetch time purely measures how it took to transfer the result,
which has absolutely nothing to do with executing the query. The fetch
time can even vary each time you run the query. Why should your
network connection decide how good or bad your query is? Ok, one use
actually exists: if the query returns too many data the transfer takes
much longer. But even this is not entirely true, because sometimes
results are cached, so they can be sent out quicker.
On the other hand, for Oracle, during a select, most of the action takes place during the fetch. This is explained by Tom Kyte himself here
think of it this way
1) parse - pretty well defined, that is prepareStatement - we do a
soft or hard parse, compile the statement, figure out how to execute
it.
2) execute - we OPEN the statement. For an update, for a delete, for
an insert - that would be it, when you OPEN the statement, we execute
it. All of the work happens here.
for select it is more complex. Most selects will do ZERO work during
the execute. All we are doing is opening the cursor - the cursor is a
pointer to the space in the shared pool where the plan is, your bind
variable values, the SCN that represents the "as of" time for your
query - in short the cursor at this point is your context, your
virtual machine state, think of the SQL plan as if it were bytecode
(it is) executed as a program (it is) in a virtual machine (it is).
The cursor is your instruction pointer (where are you in the execution
of this statement), your state (like registers), etc. Normally, a
select does nothing here - it just "gets ready to rock and roll, the
program is ready to go, but not yet really started".
However, there are exceptions to everything - turn on trace and do a
select * from scott.emp FOR UPDATE. That is a select, but it is also
an update. You would see work done during the execute as well as the
fetch phase. The work done during the execute was that of going out
and touching every row and locking it. The work done during the fetch
phase was that of going out and retrieving the data back to the
client.
3) fetch - this is where we see almost all of the work for SELECTS
(and nothing really for the other DMLS as you do not fetch from an
update).
There are two ways a SELECT might be processed. What I call a "quick
return query" and a "slow return query"
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:275215756923#39255764276301
is an excerpt from Effective Oracle by Design describing this in
depth, but suffice to say a query of the form:
select * from one_billion_row_table;
would not copy the data anywhere, would not need to access the last
row before returning the first row. We would just read the data as you
fetch it from the blocks it resides on.
However, a query of the form:
select * from one_billion_row_table order by unindexed_column;
that we would probably have to read the last row before returning the
first row (since the last row read could well be the first row
returned!) and we'd need to copy that somewhere (temp, sort area
space) first.
In the case of the first query, if you:
parsed it (little work parsing) opened it (no real world, just getting
ready) fetched 1 row and closed it
you would see VERY little work performed in the fetch phase, we'd just
have to read one block probably to return the first record.
However, do the same steps against the second query and you would see
the fetch of a single row do a TON of work - since we have to find the
last row before the first can be returned.