1

I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:

rows = session.query(MyModel) \
        .order_by(MyModel.published_date.desc()) \
        .group_by(MyModel.category_id) \
        .all()

It works fine with SQLite, but for MySQL I get this error:

[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.

What's the proper solution with SQLAlchemy?

Thanks in advance

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
Mr. B.
  • 8,041
  • 14
  • 67
  • 117
  • proper solution is fixing the wrong sql. Put sql query and we can help to tell what is wrong with it – Livius Oct 30 '19 at 07:36
  • The query does not really work in SQLite afaic; it only guarantees certain behaviour for "bare" / non-aggregate columns when `min()` or `max()` aggregates are used: https://www.sqlite.org/lang_select.html. It has never really worked (as in had well defined behaviour) in MySQL either. It looks like you're trying to perform a [tag:greatest-n-per-group] query, selecting the latest `MyModel` per `category_id` by `published_date`, is that correct? – Ilja Everilä Oct 30 '19 at 07:43
  • @IljaEverilä It works in SQLite for me. Weird. Yes, I'm trying to get the latest/greatest of each category. It works in MySQL with `set sql_mode="traditional";`, which I'd like to avoid. – Mr. B. Oct 30 '19 at 08:58
  • It may return the results you're after *most of the time*, but the behaviour is not well defined and may break, as was the case with MySQL when they finally started following the SQL standard more closely in this matter. – Ilja Everilä Oct 30 '19 at 09:11
  • Related: https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql, https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column, https://stackoverflow.com/questions/10999522/how-to-get-the-latest-record-in-each-group-using-group-by – Ilja Everilä Oct 30 '19 at 09:59
  • @IljaEverilä Thank you very much. I'm looking forward to test your answer later today. :-) – Mr. B. Oct 30 '19 at 11:22
  • 1
    Depending on your query you just could need a simple solution like this one https://stackoverflow.com/questions/34115174/error-related-to-only-full-group-by-when-executing-a-query-in-mysql?rq=1 – Yor Jaggy Feb 25 '20 at 03:06

1 Answers1

1

One way to form the query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:

my_model_alias = aliased(MyModel)

rows = session.query(MyModel).\
    outerjoin(my_model_alias,
              and_(my_model_alias.category_id == MyModel.category_id,
                   my_model_alias.published_date > MyModel.published_date)).\
    filter(my_model_alias.id == None).\
    all()

This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:

sq = session.query(
        MyModel,
        func.row_number().
            over(partition_by=MyModel.category_id,
                 order_by=MyModel.published_date.desc()).label('rn')).\
    subquery()

my_model_alias = aliased(MyModel, sq)

rows = session.query(my_model_alias).\
    filter(sq.c.rn == 1).\
    all()

Of course you could use GROUP BY as well, if you then use the results in a join:

max_pub_dates = session.query(
        MyModel.category_id,
        func.max(MyModel.published_date).label('published_date')).\
    group_by(MyModel.category_id).\
    subquery()

rows = session.query(MyModel).\
    join(max_pub_dates,
         and_(max_pub_dates.category_id == MyModel.category_id,
              max_pub_dates.published_date == MyModel.published_date)).\
    all()
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127