SQLAlchemy: How to use group_by() correctly (only_full_group_by)?

Question

I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:

rows = session.query(MyModel) \
        .order_by(MyModel.published_date.desc()) \
        .group_by(MyModel.category_id) \
        .all()

It works fine with SQLite, but for MySQL I get this error:

[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.

What's the proper solution with SQLAlchemy?

Thanks in advance

proper solution is fixing the wrong sql. Put sql query and we can help to tell what is wrong with it — Livius, Oct 30 '19 at 07:36
The query does not really work in SQLite afaic; it only guarantees certain behaviour for "bare" / non-aggregate columns when `min()` or `max()` aggregates are used: https://www.sqlite.org/lang_select.html. It has never really worked (as in had well defined behaviour) in MySQL either. It looks like you're trying to perform a [tag:greatest-n-per-group] query, selecting the latest `MyModel` per `category_id` by `published_date`, is that correct? — Ilja Everilä, Oct 30 '19 at 07:43
@IljaEverilä It works in SQLite for me. Weird. Yes, I'm trying to get the latest/greatest of each category. It works in MySQL with `set sql_mode="traditional";`, which I'd like to avoid. — Mr. B., Oct 30 '19 at 08:58
It may return the results you're after *most of the time*, but the behaviour is not well defined and may break, as was the case with MySQL when they finally started following the SQL standard more closely in this matter. — Ilja Everilä, Oct 30 '19 at 09:11
Related: https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql, https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column, https://stackoverflow.com/questions/10999522/how-to-get-the-latest-record-in-each-group-using-group-by — Ilja Everilä, Oct 30 '19 at 09:59
@IljaEverilä Thank you very much. I'm looking forward to test your answer later today. :-) — Mr. B., Oct 30 '19 at 11:22
Depending on your query you just could need a simple solution like this one https://stackoverflow.com/questions/34115174/error-related-to-only-full-group-by-when-executing-a-query-in-mysql?rq=1 — Yor Jaggy, Feb 25 '20 at 03:06

Ilja Everilä · Accepted Answer · 2019-10-30T09:46:56.600

One way to form the greatest-n-per-group query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:

my_model_alias = aliased(MyModel)

rows = session.query(MyModel).\
    outerjoin(my_model_alias,
              and_(my_model_alias.category_id == MyModel.category_id,
                   my_model_alias.published_date > MyModel.published_date)).\
    filter(my_model_alias.id == None).\
    all()

This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:

sq = session.query(
        MyModel,
        func.row_number().
            over(partition_by=MyModel.category_id,
                 order_by=MyModel.published_date.desc()).label('rn')).\
    subquery()

my_model_alias = aliased(MyModel, sq)

rows = session.query(my_model_alias).\
    filter(sq.c.rn == 1).\
    all()

Of course you could use GROUP BY as well, if you then use the results in a join:

max_pub_dates = session.query(
        MyModel.category_id,
        func.max(MyModel.published_date).label('published_date')).\
    group_by(MyModel.category_id).\
    subquery()

rows = session.query(MyModel).\
    join(max_pub_dates,
         and_(max_pub_dates.category_id == MyModel.category_id,
              max_pub_dates.published_date == MyModel.published_date)).\
    all()

SQLAlchemy: How to use group_by() correctly (only_full_group_by)?

1 Answers1