2

I answered a this question: https://stackoverflow.com/a/18521684/1707323 with a query similiar to:

SELECT
  *
FROM
  (
    SELECT
      *
    FROM
      table_name
    WHERE
      table_name.some_field='1' OR
      table_name.some_field='2'
    ORDER BY
      table_name.some_field
  ) sub_query
GROUP BY
  sub_query.primary_key_column

And the comment was made

Clever idea, but this is not guaranteed to work in MySQL, and will cause a hard error in SQL Server and most other database systems. Quoting from Extensions to GROUP BY: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

It's entirely possible that for your specific version of MySQL, for your specific tables, it will always give you the results you want, because the query plan you get causes your query to be executed exactly the way you expect. But if the documentation explicitly states that it will indeterminately choose a value, and that your ORDER BY is not enough to make this query reliable, I would be very cautious about trying this on other databases, or other versions of MySQL.

The MySQl manual continues also to say: Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses. Which would be true for queries such as

SELECT
  *
FROM
  table_name
WHERE
  table_name.some_field='1' OR
  table_name.some_field='2'
GROUP BY
  table_name.primary_key_column
ORDER BY
  table_name.some_field

But that is different than the query with the order by clause in the sub query. I did find similar question, but yet again it does not involve the ORDER BY clause in the sub query

Can anyone shed some light on this in reference to the MySQL manual whether an ORDER BY clause in the subquery will force grouping to always use the first instance of that group.

EDIT: Observations

Shouldn't FIFO or LIFO have something to do with GROUP BY with the way it operates? I just can't wrap my mind around:

The server is free to choose any value from each group

There has to be some sort of logic to this choice. The computer is not allowed to make choices on its own without some sort of directive. If the computer can make choices on its own without any directive then we have truly reached the ultimate goal in computing, ARTIFICIAL INTELLIGENCE, and now the computers can write all my code by giving them one directive for the application. [LOL] What are the odds that this is undocumented logic? wouldn't RAND in the logic take more processing power than FIFO or FILO? There has to be some logic involved here that may just be undocumented but works 100% of the time. One thing I always loved is proofs in Geometry. I learned that you have to prove if it is right or wrong. I can prove how it works, yet I have not seen how it does not work. Prove how it is so if it works that way or prove how it is not so if it does not work that way.

Community
  • 1
  • 1
amaster
  • 1,915
  • 5
  • 25
  • 51
  • The `ORDER BY` still won't work. The subquery produces results and then orders them in your specific order. But the `GROUP BY` clause in the parent query is still free to choose any item from the group. This would be akin to expecting it to choose a determinate value from the group in a regular table if the table was in a specific order. – Cfreak Aug 30 '13 at 04:34
  • @Cfreak Well here is my example of how it is so every time [SQL Fiddle](http://sqlfiddle.com/#!2/68627/4) Now can you show me how it is not so with an example where the order by does not effect the group by?? – amaster Aug 30 '13 at 04:50
  • 1
    spencer7593's answer, answers your question. It might work for your specific version but it's not guaranteed to work. You shouldn't rely on it. – Cfreak Aug 30 '13 at 13:36
  • @amaster507 Your tenacity is admirable but unfounded. Aside from the theoretical problems with this approach, I know of no example where this solution performs faster than the (documented) 'uncorrelated subquery' method. – Strawberry Aug 30 '13 at 15:49

1 Answers1

1

No. An ORDER BY in an inline view does not force the GROUP BY operation in the outer query to obtain non-aggregate values from the "first" row in the group. MySQL is free to choose any row from the group.

You may observe that this is what happens, but this behavior is not guaranteed.

The behavior may be different using different database engines, or a patchset or newer version of MySQL, or may change just with the value distribution within the tables.


followup

(based on updated question)

The idea behind SQL (Structured Query Language) is that it is declarative, in that it declares the resultset to be returned, and does not specify "how" the result set is to be returned.

You are quite correct that well-written code isn't 'random' in nature, it executes following a set of rules, and we observe consistent behavior. However, this is not a guarantee that this behavior won't ever change. (Compilers implement optimizations which result in machine code that gives the same result, but operates differently.)

For years, when we did a GROUP BY in Oracle, we observed that Oracle would use a sort operation as part of the "how" it went about satisfying the GROUP BY. Everytime we ran the same query, Oracle would do the same sort. But Oracle never made a guarantee that a sort operation would always be used.

When a new version of Oracle was rolled out, lo and behold, we observe that Oracle is now using a hash operation, rather than a sort operation, to satisfy a GROUP BY. Any query that included an ORDER BY clause continued to return results in the order that was specified (guaranteed behavior), but other queries which didn't have an ORDER BY had rows returned in a different order. Oracle just improved "how" it went about preparing a resultset that satisfied the requirements.

This one example demonstrates why we don't write queries that depend on a behavior that is not guaranteed.

In some later release of MySQL, we will likely observe changes in the behavior, where the resultset returned by a query is guaranteed to meet the specifications, but will be DIFFERENT from the resultset returned currently.

One of the primary design goals of SQL is that we declare the result set is to be returned, without specifying "how" the DBMS is to go about producing the resultset. We do use hints, and we structure queries in certain ways, and that does influence the execution plan generated by the optimizer. But this is not a guarantee.

If the resultset is to meet a specified requirement, then the SQL should include that specification; we should not rely on a particular observed behavior to imply that specification.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • I guess this is the right answer according to the actual MySQL manuals... But I am still interested to see at least one example that works like the manual says and not work with a subquery and ORDER BY – amaster Aug 30 '13 at 13:51
  • So in other words, it does work that way with FIFO currently and does not necessarily allow the server to be free to choose any value from each group, but they include that statement to allow the developers of MySQL the freedom to change the FIFO to LIFO or some other mechanism later on that might break some queries that currently work by using that behavior. So technically that statement `The server is free to choose any value...` should be `The server's choice of the value from each group is based on the current version's optimization logic which may change from version to version` – amaster Aug 30 '13 at 21:35
  • thanks again for the great answer and explaining this to me. I have submitted a "bug" request for MySQL to update that bit of the manual to clarify that the server is not actually allowed to choose any row but that the logic may change. I quoted parts of what you said and credited you. the documentation "bug" can be found [here](http://bugs.mysql.com/bug.php?id=70203) – amaster Aug 30 '13 at 21:51