5

I want to select one row each for each distinct value of a column. Here I want to perform it for col1:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  2 |    0 |    1 |
|  3 |    1 |   11 |
|  4 |    1 |   12 |
|  5 |    1 |   16 |

Which results in:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  3 |    1 |   11 |

as 0 and 1 were the distinct values for col1. I am not strict about which row is returned (no need of least value of id, for example) as long as distinct values of column is returned.

Please assume the model is called TestModel and everything is fully mapped.

Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202

3 Answers3

6

You need to use sub-query for that. And also you have to choose MIN or MAX function:

SELECT * FROM TestModel
WHERE ID IN(SELECT MIN(id) 
            FROM TestModel 
            GROUP BY col1)

Output:

| ID | COL1 | COL2 |
--------------------
|  1 |    0 |    0 |
|  3 |    1 |   11 |

See this SQLFiddle


SA version of the solution:

subq = (session.query(func.min(TestModel.id).label("min_id")).
        group_by(TestModel.col1)).subquery()

qry = (session.query(TestModel).
       join(subq, and_(TestModel.id == subq.c.min_id)))
van
  • 74,297
  • 13
  • 168
  • 171
Himanshu
  • 31,810
  • 31
  • 111
  • 133
  • I want syntax for SQLAlchemy, not plain SQL – Jesvin Jose Nov 20 '12 at 07:25
  • @van: Thanks. Actually I don't have knowledge about SQLAlchemy. :) – Himanshu Nov 23 '12 at 06:21
  • @van, please tell the advantages of your query; it is more complex than my answer, and I see no obvious benefits. – Jesvin Jose Nov 23 '12 at 09:51
  • 2
    simple: your (by SA generated SQL) code will not work with all the RDBMS systems. For example, it will not run on MSSQL. One of the benefits of the SA is that your product is (might be) RDBMS-agnostic. I personally value this a lot. But if you use `MySQL`, `sqlite`, `Oracle` or other RDBMS where `GROUP BY` is not strict, your code is perfect. I am not sure about ANSI SQL compatibility though. – van Nov 23 '12 at 12:34
4

As of SQLAlchemy 1.0 you can call distinct with a criterion, which allows you to write

session.query(TestModel).distinct(TestModel.col1).all()

In Postgresql this will translate to a DISTINCT ON query.

This not only works with the ORM but also with a SQLAlchemy Core select.

kynan
  • 13,235
  • 6
  • 79
  • 81
  • This solution worked for me, but the query seemed less efficient than group by. I didn't really do benchmark though. But wouldn't `SELECT MIN(TestModel.id) FROM TestModel GROUP BY col1` be more efficient? – Guy Jul 28 '16 at 06:07
3

This will return TestModel objects for each distinct values of TestModel.col1 column, just like GROUP BY queries in SQL.

session.query( TestModel).group_by( TestModel.col1).all()
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
  • Maybe you should elaborate on this a bit more. Code-only answers aren't really helpful. – Linus Kleen Nov 21 '12 at 11:20
  • What am I missing? this doesn't seem to work at all. The produced SQL (under postgresql) is `SELECT "order".id AS order_id, "order".external_id AS order_external_id FROM "order" GROUP BY "order".external_id`. This SQL doesn't work because you use an aggregate function on the columns that aren't part of the group-by parameters (order.id etc.). I would expect to have "FIRST" or "MIN" or something similar. – Guy Jul 28 '16 at 05:48
  • Hmm, I would now use `distinct on` for Postgres. I was also confused by Postgres not working on using "an aggregate function on the columns that aren't part of the group-by parameters". Please post a truly cross-platform answer and I will accept it. – Jesvin Jose Jul 29 '16 at 04:55