1

I have a legacy application in which Group By has been used along with non aggregate columns to fetch the first row for each group. The query is as following:

SELECT
    columnPrimaryKey,
    column1,
    column2,
    column3
FROM
    (SELECT
        columnPrimaryKey,
        column1,
        column2,
        column3
    FROM testTable
    ORDER BY column2
) AS tbl
GROUP BY column3

Recently, the version was updated to 5.7.22 and now the above query is not returning expected results even after disabling the ONLY_FULL_GROUP_BY mode.

Yes, I can rewrite the query as following to work according to the new behavior as:

SELECT
    x.columnPrimaryKey,
    x.column1,
    x.column2,
    x.column3
FROM tableName AS x INNER JOIN (
    SELECT
        MIN( column2 ) AS column2,
        column3
    FROM tableName
    GROUP BY column3
) AS y ON x.column2 = y.column2 AND x.column3 = y.column3;

Unfortunately, that's not an option for now. The only option I see is to downgrade to 5.7.5 earlier.

Fiddle 5.7 with 'ONLY_FULL_GROUP_BY' disabled and unexpected results:

https://www.db-fiddle.com/f/8VjB7XpkobWVyXpPvUaGt2/0

Fiddle 5.6 with default modes and expected results:

https://www.db-fiddle.com/f/8VjB7XpkobWVyXpPvUaGt2/1

My question is: Is there any way to disable this behavior of random selection so that the legacy code works without rewriting them or downgrading?

Any suggestions greatly appreciated!

Abhishekh Gupta
  • 6,206
  • 4
  • 18
  • 46
  • I don't know of a solution to your problem. Lesson learned: _Don't_ rely on only full group by mode being turned off. As you've seen, there is no guarantee the results will be deterministic, and in addition, your query would not even run on most other databases, if you had to port it. – Tim Biegeleisen Aug 22 '18 at 14:33
  • @TimBiegeleisen You are absolutely right but as I said it's an legacy application so I am looking to avoid rewriting the queries for now. And this leaves me to only solution i.e., downgrade the version :( – Abhishekh Gupta Aug 22 '18 at 14:37
  • Then downgrade for a while, and when you get a chance to rewrite the queries, the upgrade again. Nothing wrong with that IMO. – Tim Biegeleisen Aug 22 '18 at 14:37
  • @TimBiegeleisen Yeah, would have to do that. Thanks – Abhishekh Gupta Aug 22 '18 at 14:39
  • This is happening in the newer MySQL and MariaDB ( https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/ ) (p.s MySQL docs does **not** have this documentated).. Adding a LIMIT in the MySQL 5.7 fiddle "fixes" your result https://www.db-fiddle.com/f/8VjB7XpkobWVyXpPvUaGt2/3 .. – Raymond Nijland Aug 22 '18 at 14:57
  • It's not only a problem with invalid use off `GROUP BY` @TimBiegeleisen check mine other comment.. The SQL handling in MySQL 5.7 looks to changed to follow SQL standards more. Only problem with the "fix" is that MySQL/MariaDB needs to use "Using temporary; Using filesort" which is very very very far from optimal when used on very large tables – Raymond Nijland Aug 22 '18 at 15:07
  • @RaymondNijland Good detective work, but I was commenting on the premise that the OP cannot change the query. – Tim Biegeleisen Aug 22 '18 at 15:18

1 Answers1

1

Your ORDER BY in the derived table subquery is ignored in MySQL 5.7.

See https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html

The optimizer propagates an ORDER BY clause in a derived table or view reference to the outer query block if these conditions are all true:

  • The outer query is not grouped or aggregated.

  • The outer query does not specify DISTINCT, HAVING, or ORDER BY.

  • The outer query has this derived table or view reference as the only source in the FROM clause.

Otherwise, the optimizer ignores the ORDER BY clause.

Your outer query has a JOIN and a GROUP BY, so it doesn't qualify to propagate the ORDER BY, therefore it ignores the ORDER BY.

This optimizer behavior is controlled by the optimizer switch derived_merge. You can disable it.

Demo:

mysql [localhost] {msandbox} (test) > select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.21    |
+-----------+

mysql [localhost] {msandbox} (test) > SELECT     columnPrimaryKey,     column1,     column2,     column3 FROM     (SELECT         columnPrimaryKey,         column1,         column2,         column3     FROM testTable     ORDER BY column2 ) AS tbl GROUP BY column3;
+------------------+----------------+---------+---------+
| columnPrimaryKey | column1        | column2 | column3 |
+------------------+----------------+---------+---------+
|                1 | Some Name 8-4  |       4 |       8 |
|                6 | Some Name 9-1  |       1 |       9 |
|                8 | Some Name 10-2 |       2 |      10 |
+------------------+----------------+---------+---------+

mysql [localhost] {msandbox} (test) > set optimizer_switch = 'derived_merge=off';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT     columnPrimaryKey,     column1,     column2,     column3 FROM     (SELECT         columnPrimaryKey,         column1,         column2,         column3     FROM testTable     ORDER BY column2 ) AS tbl GROUP BY column3;
+------------------+----------------+---------+---------+
| columnPrimaryKey | column1        | column2 | column3 |
+------------------+----------------+---------+---------+
|                5 | Some Name 8-1  |       1 |       8 |
|                6 | Some Name 9-1  |       1 |       9 |
|                8 | Some Name 10-2 |       2 |      10 |
+------------------+----------------+---------+---------+
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Ok it was documentated, i commented it was not documentated in the MySQL docs... Using a `set optimizer_switch = 'derived_merge=off'` query after connecting to the MySQL database in application should save the topicstarter to rewrite all queries.. – Raymond Nijland Aug 22 '18 at 15:33
  • 1
    One can also set the optimizer switch globally, and make it persistent in /etc/my.cnf – Bill Karwin Aug 22 '18 at 15:35