62

I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?

# this takes over 3 seconds to return
session.query(Segment).count()

However:

SELECT COUNT(*) FROM segments;
+----------+
| COUNT(*) |
+----------+
|   281992 |
+----------+
1 row in set (0.07 sec)

The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).

Update

Using session.query(Segment.id).count() instead of session.query(Segment).count() seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.

mtth
  • 4,671
  • 3
  • 30
  • 36
  • 6
    I don't know that SQLAlchemy thing, but it sounds as if it would iterate over the result instead of sending a count(*) to the backend. –  Feb 07 '13 at 15:40
  • The [docs](http://docs.sqlalchemy.org/ru/latest/orm/query.html#sqlalchemy.orm.query.Query.count) seem to say that the count function simply issues a count statement though. – mtth Feb 07 '13 at 15:44
  • 1
    Enable logging using `echo=True` parameter when you create an `engine` so see the `SQL` statement actually generated. – van Feb 07 '13 at 15:46
  • 1
    Thanks for the suggestion. Using `session.query(Segment).count()` actually issues a `count(*) FROM (SELECT segments.column_1 as segments_column_1, ... FROM segments)` instead of a simple `count(*) FROM segments`. I guess the slow down comes from expanding all the columns. – mtth Feb 07 '13 at 15:54
  • 2
    Just to mention that these queries have same performance on PostgreSQL. – Taha Jahangir Apr 21 '14 at 13:19

3 Answers3

91

Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id)):

Return a count of rows this Query would return.

This generates the SQL for this Query as follows:

SELECT count(1) AS count_1 FROM (
     SELECT <rest of query follows...> ) AS anon_1

For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:

from sqlalchemy import func

# count User records, without
# using a subquery.
session.query(func.count(User.id))

# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
        group_by(User.name)

from sqlalchemy import distinct

# count distinct "name" values
session.query(func.count(distinct(User.name)))
Mario Rossi
  • 7,651
  • 27
  • 37
zzzeek
  • 72,307
  • 23
  • 193
  • 185
  • Thanks for the reply. Could you elaborate on the 'terrible support of subqueries' from MySQL? – mtth Feb 08 '13 at 02:30
  • the best explanation via google is unfortunately this most horribly formatted blog post, but gets the idea across: http://www.mysqlperformanceblog.com/2010/10/25/mysql-limitations-part-3-subqueries/ – zzzeek Feb 08 '13 at 06:06
  • longer post, but near the end goes into more detail about MySQL's planner in this regard: http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/ – zzzeek Feb 08 '13 at 06:12
  • How do you get the actual int/long count value from the query? If I call count on session.query(func.count(Table.id)), it throws an exception.. – dpb Nov 05 '13 at 03:23
  • 7
    session.query(func.count(Table.id)).scalar() will give you the first col of the first row. – zzzeek Nov 06 '13 at 17:41
  • Is it possible to use this optimized query on a (lazy='dynamic') relationship? So if I have e.g. `self.relationship.filter_by(some_column=some_value).count()` is there a way to still use the optimized query? – Andrew Mar 25 '16 at 15:50
  • 4
    yes you can say `self.relationship.with_entities(func.count(User.id)).scalar()`. – zzzeek Mar 25 '16 at 22:59
22

The reason is that SQLAlchemy's count() is counting the results of a subquery which is still doing the full amount of work to retrieve the rows you are counting. This behavior is agnostic of the underlying database; it isn't a problem with MySQL.

The SQLAlchemy docs explain how to issue a count without a subquery by importing func from sqlalchemy.

session.query(func.count(User.id)).scalar()

>>>SELECT count(users.id) AS count_1 \nFROM users')
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
aeb0
  • 816
  • 9
  • 12
10

It took me a long time to find this as the solution to my problem. I was getting the following error:

sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): Incorrect key file for table '/tmp/#sql_40ab_0.MYI'; try to repair it

The problem was resolved when I changed this:

query = session.query(rumorClass).filter(rumorClass.exchangeDataState == state)
return query.count()

to this:

query = session.query(func.count(rumorClass.id)).filter(rumorClass.exchangeDataState == state)
return query.scalar()
Jerry K.
  • 500
  • 5
  • 9
  • 9
    How does this relate to the question? If an error is thrown, the situation looks completely different than described by the OP – Nico Haase Jan 22 '19 at 13:37
  • 1
    very curious why this is faster and if it's still the faster option in 2022 – Matt Nov 13 '22 at 19:04