70

SQLAlchemy's Query.distinct method is behaving inconsistently:

>>> [tag.name for tag in session.query(Tag).all()]
[u'Male', u'Male', u'Ninja', u'Pirate']
>>> session.query(Tag).distinct(Tag.name).count()
4
>>> session.query(Tag.name).distinct().count()
3

So the second form gives the correct result but the first form does not. This appears to happen with SQLite but NOT with Postgres. I have a function which is passed a query object to have a distinct clause applied to it, so it would be highly difficult to rewrite everything top use the second approach above. Is there something obvious that I'm missing?

Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256

2 Answers2

86

According to the docs:

When present, the Postgresql dialect will render a DISTINCT ON (>) construct.

So, passing column expressions to distinct() works for PostgreSQL only (because there is DISTINCT ON).

In the expression session.query(Tag).distinct(Tag.name).count() sqlalchemy ignores Tag.name and produces the query (distinct on all fields):

SELECT DISTINCT tag.country_id AS tag_country_id, tag.name AS tag_name 
FROM tag

As you said, in your case distinct(Tag.name) is applied - so instead of just count() consider using this:

session.query(Tag).distinct(Tag.name).group_by(Tag.name).count()
starball
  • 20,030
  • 7
  • 43
  • 238
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    Thanks; I saw that comment in the docs about `DISTINCT ON` but since the docs didn't explicitly say 'and that's the only way that this works' or anything like it, I didn't realize that was implied. – Eli Courtwright Jun 20 '13 at 21:37
  • 1
    Downvoted because promoting the SQLite specific extension, which allows selecting non-aggregate columns that are not determined by the GROUP BY clause, producing indeterminate results (well not exactly here because of the `count()`, but correct solutions exist, like in the other answer). – Ilja Everilä Aug 17 '18 at 05:05
  • @IljaEverilä nice duplet downvote, totally by an accident. – alecxe Aug 17 '18 at 05:14
  • @IljaEverilä thanks for the comment though, it's not often that happens with a downvote, but certainly a desired behavior. I didn't really promote anything here, just tried to help answering an *sqlite-specific question*. – alecxe Aug 17 '18 at 05:19
  • Though this is horribly hindsighted, given you're answering an sqlite-specific question, why not drop the `distinct(Tag.name)` altogether, because it's useless in SQLite and the query as it is will be rejected by most SQL DBMS due to the grouping. On the double dv, I do share links to interesting q/a (one way or the other). This would make a nice dupe target, if not for the accepted answer, though you nicely explain why `distinct()` even accepts expressions as args. – Ilja Everilä Aug 17 '18 at 06:23
49

When you use session.query(Tag) you alway query for the whole Tag object, so if your table contains other columns it won't work.

Let's assume there is an id column, then the query

sess.query(Tag).distinct(Tag.name)

will produce:

SELECT DISTINCT tag.id AS tag_id, tag.name AS tag_name FROM tag

The argument to the distinct clause is ignored completely.

If you really only want the distinct names from the table, you must explicitly select only the names:

sess.query(Tag.name).distinct()

produces:

SELECT DISTINCT tag.name AS tag_name FROM tag
mata
  • 67,110
  • 10
  • 163
  • 162