SQLAlchemy: Selecting one row for each distinct value of a column

Question

I want to select one row each for each distinct value of a column. Here I want to perform it for col1:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  2 |    0 |    1 |
|  3 |    1 |   11 |
|  4 |    1 |   12 |
|  5 |    1 |   16 |

Which results in:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  3 |    1 |   11 |

as 0 and 1 were the distinct values for col1. I am not strict about which row is returned (no need of least value of id, for example) as long as distinct values of column is returned.

Please assume the model is called TestModel and everything is fully mapped.

In MySQL, that's a simple `GROUP BY` – John Dvorak Nov 20 '12 at 07:01 — John Dvorak, Nov 20 '12 at 07:01

score 6 · Answer 1 · edited Nov 20 '12 at 07:58

6

You need to use sub-query for that. And also you have to choose MIN or MAX function:

SELECT * FROM TestModel
WHERE ID IN(SELECT MIN(id) 
            FROM TestModel 
            GROUP BY col1)

Output:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  3 |    1 |   11 |

See this SQLFiddle

SA version of the solution:

subq = (session.query(func.min(TestModel.id).label("min_id")).
        group_by(TestModel.col1)).subquery()

qry = (session.query(TestModel).
       join(subq, and_(TestModel.id == subq.c.min_id)))

edited Nov 20 '12 at 07:58

van

74,297
13
168
171

answered Nov 20 '12 at 07:08

Himanshu

31,810
31
111
133

I want syntax for SQLAlchemy, not plain SQL – Jesvin Jose Nov 20 '12 at 07:25
@van: Thanks. Actually I don't have knowledge about SQLAlchemy. :) – Himanshu Nov 23 '12 at 06:21
@van, please tell the advantages of your query; it is more complex than my answer, and I see no obvious benefits. – Jesvin Jose Nov 23 '12 at 09:51
2

simple: your (by SA generated SQL) code will not work with all the RDBMS systems. For example, it will not run on MSSQL. One of the benefits of the SA is that your product is (might be) RDBMS-agnostic. I personally value this a lot. But if you use `MySQL`, `sqlite`, `Oracle` or other RDBMS where `GROUP BY` is not strict, your code is perfect. I am not sure about ANSI SQL compatibility though. – van Nov 23 '12 at 12:34

score 4 · Answer 2 · answered Aug 25 '15 at 08:28

4

As of SQLAlchemy 1.0 you can call distinct with a criterion, which allows you to write

session.query(TestModel).distinct(TestModel.col1).all()

In Postgresql this will translate to a DISTINCT ON query.

This not only works with the ORM but also with a SQLAlchemy Core select.

answered Aug 25 '15 at 08:28

kynan

13,235
6
79
81

This solution worked for me, but the query seemed less efficient than group by. I didn't really do benchmark though. But wouldn't `SELECT MIN(TestModel.id) FROM TestModel GROUP BY col1` be more efficient? – Guy Jul 28 '16 at 06:07

Jesvin Jose · Accepted Answer · 2012-11-21T14:09:15.833

3

This will return TestModel objects for each distinct values of TestModel.col1 column, just like GROUP BY queries in SQL.

session.query( TestModel).group_by( TestModel.col1).all()

edited Nov 21 '12 at 14:09

answered Nov 21 '12 at 11:00

Jesvin Jose

22,498
32
109
202

Maybe you should elaborate on this a bit more. Code-only answers aren't really helpful. – Linus Kleen Nov 21 '12 at 11:20
What am I missing? this doesn't seem to work at all. The produced SQL (under postgresql) is `SELECT "order".id AS order_id, "order".external_id AS order_external_id FROM "order" GROUP BY "order".external_id`. This SQL doesn't work because you use an aggregate function on the columns that aren't part of the group-by parameters (order.id etc.). I would expect to have "FIRST" or "MIN" or something similar. – Guy Jul 28 '16 at 05:48
Hmm, I would now use `distinct on` for Postgres. I was also confused by Postgres not working on using "an aggregate function on the columns that aren't part of the group-by parameters". Please post a truly cross-platform answer and I will accept it. – Jesvin Jose Jul 29 '16 at 04:55

SQLAlchemy: Selecting one row for each distinct value of a column

3 Answers3

See this SQLFiddle

Linked