2

I have some strange problem with one select. Is it possible that the order in WHERE clause can influence the result?

Here is my select:

select u.userName, u.fullName, g.uuid as groupUuid, g.name as `group`, 
    m.number as module, count(distinct b.uuid) as buildCount, max(b.datetime),
    count(distinct e.buildId) as errorBuildCount, e.id as errorId
    from User u
    inner join GROUP_USER GU on GU.user_id = u.id 
    inner join `Group` g on g.id = GU.group_id 
    inner join Course c on c.id = g.courseId
    left outer join Build b on b.userId = u.id 
    left outer join Module m on m.id = b.moduleId 
    left outer join Error e on e.buildId = b.id
    where c.uuid = 'HMUUcabR1S4GRTIwt3wWxzCO' and g.uuid = 'abcdefghijklmnopqrstuvwz'
    group by u.userName,m.number,c.uuid, g.uuid
    order by g.id asc, u.fullName asc, m.number asc

this will reproduce this result: http://dl.dropbox.com/u/4892450/sqlSelectProblem/select1.PNG

When I use this condition:

where g.uuid = 'abcdefghijklmnopqrstuvwz' and c.uuid = 'HMUUcabR1S4GRTIwt3wWxzCO' (different order) I get a different result (see errorId column): http://dl.dropbox.com/u/4892450/sqlSelectProblem/select2.PNG

Could you please help me? Is the whole select wrong, or can it be a mysql bug?

CosminO
  • 5,018
  • 6
  • 28
  • 50
iFL
  • 33
  • 1
  • 6
  • 1
    so the only difference between those two screen caps is the one errorID where one is `9` and in the other it's `NULL`? – Marc B Mar 04 '13 at 15:27
  • Perhaps your underlying data changed between the two queries. The order in the `where` clause would not affect the results. – Gordon Linoff Mar 04 '13 at 15:28
  • 3
    Probably this is due to the fact that you have several columns in your `SELECT` which are not also in the `GROUP BY`. While MySQL allows this (other RDBMS won't), it often results in unexpected behavior for non-grouped columns (indeterminate results). – Michael Berkowski Mar 04 '13 at 15:28
  • I would re-examine your join on `Error e`. – Marcus Adams Mar 04 '13 at 15:29

3 Answers3

3

The only difference between the results is an errorId column. Ungrouped and unaggregated columns are not allowed by sql standard (sql-92 standard, check out the link) and will not even run in most db engines. So, engine's behavior in this situation is not specified. Accoding to docs (thanks to Marcus Adams):

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

You can get errorId as an aggregate value:

MAX(e.id) as errorId

or include it in GROUP BY list:

group by u.userName,m.number,c.uuid, g.uuid,e.Id

Then your query results should be stable.

Further reading:

Why does MySQL add a feature that conflicts with SQL standards? - detailed explanation of differences between sql standard and mysql implementation. (Thanks to GarethD)

Community
  • 1
  • 1
default locale
  • 13,035
  • 13
  • 56
  • 62
  • 1
    Not deliberately random, so much as "indeterminate". Often it takes the first one per group but that isn't guaranteed. – Michael Berkowski Mar 04 '13 at 15:30
  • 1
    Although your answer is correct for this problem the statement that **Ungrouped and unaggregated columns are not allowed by sql standard** is not. Although this has not been implemented fully in any DBMS I am aware of, the standard states that Columns that are functionally dependant on a column contained in the group by can be included. E.g. Grouping by a primary key would allow you to reference any column in that table. – GarethD Mar 04 '13 at 15:42
  • 1
    Here's the [MySQL docs](http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html) on the matter. – Marcus Adams Mar 04 '13 at 15:43
  • @GarethD Thank you very much for useful comments. I'll try to update my answer and provide precise explanation. BTW I wouldn't mind to delete my own answer and vote up if anybody will provide rigorous description of problem and clean solution for it. – default locale Mar 04 '13 at 15:48
  • 1
    As I said your answer is still correct because MySQL has not implemented the standard in the way it allows columns to be selected that are not in an aggregate or in the group by. This should fully explain it http://stackoverflow.com/a/7596265/1048425 – GarethD Mar 04 '13 at 15:50
  • Thanks guys. [http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards/7596265#7596265](http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards/7596265#7596265) was very helpfull for me. – iFL Mar 04 '13 at 16:36
1

You've got two different JOIN trees in your code, essentially:

               user
              /    \
    group_user      build
       /              \
    group             module
       |               |
    course           error

such a construct leads to undefined results, especially if the results of the joins in one branch has a different number of matching records than in the other branch. MySQL has to try and fill in the missing bits, and guesses. Changing the order of your WHERE clause can and WILL change the full results because you're changing the way mysql does its guesses.

Marc B
  • 356,200
  • 43
  • 426
  • 500
0

Group by all columns before aggregation. Best Practices...IN MOST CASES. and could very possibly be distorting your answers...

Hituptony
  • 2,740
  • 3
  • 22
  • 44