3

A bit of an odd one this one. I am trying to run the following query joining 3 tables.

Select t3.id, t3.name, t3.phone_no, t1.reg_dtm, count(t1.reg_dtm)
from tableA t1, tableB t2, tableC t3
Where t1.id = t2.id
And t2.id = t3.id
Group by t3.id, t3.name, t3.phone_no, t1.reg_dtm
Order by t2.id, t1.reg_dtm

The above query returns the following error

ORA-00979: not a GROUP BY expression

But if i change it so that everything in the group by clause is in the order by clause then it works.

Select t3.id, t3.name, t3.phone_no, t1.reg_dtm, count(t1.reg_dtm)
from tableA t1, tableB t2, tableC t3
Where t1.id = t2.id
And t2.id = t3.id
Group by t3.id, t3.name, t3.phone_no, t1.reg_dtm
Order by t3.id, t3.name, t3.phone_no, t1.reg_dtm

What exactly is the reason for this?

I think the problem is possibly because t2.id shown in the order by statement on the first query is not part of the group by statement. If this is the cause then why does it matter? I have never experienced this before and didn't think that there was any relationship between the group by and the order by statements.

I tested the above on Oracle 10G as well as MySQL.

Thanks in advance

ziggy
  • 15,677
  • 67
  • 194
  • 287
  • You have a typo in your first SQL statement. See my answer, (far) below. – Larry Lustig Nov 14 '11 at 18:34
  • You really should not use implicit syntax again. It is a SQL antipattern. It is subject to accidental cross joins and is harder to maintain if you later want to change to left joins. It is simply a poor syntax that is almost 20 years out of date. – HLGEM Nov 14 '11 at 18:41

5 Answers5

4

The ORDER BY clause runs after everything else in the SELECT statement executes; in a GROUPing scenario, the result set is limited to the columns used to aggregate the data. If you don't have a column specified in your initial resultset, the processing engine doesn't understand what to do with the requested output.

In other words, since your query doesn't return distinct values for t2.id and t1.id (since they're not used in the GROUP BY clause), the engine can't return the data in that order.

Stuart Ainsworth
  • 12,792
  • 41
  • 46
4

The columns that are not mentioned in the GROUP BY clause technically should not be in the result set (unless they're aggregated, i.e. max(columnname))-- so how does it make sense to ORDER BY them? In other words, what would such a query even mean?

However, MySQL (I don't know about others) allows you to select columns that are not in the GROUP BY, which leads to lots of confusion for beginners wondering why they get strange query results.

As a sidenote, you may want to avoid implicit joins, as discussed here.

Community
  • 1
  • 1
Matt Fenwick
  • 48,199
  • 22
  • 128
  • 192
  • I suspect the problem is that mySQL has allowed this kind of bad query and the OP learned the bad syntax and now wonders whyOracles insists on the correct syntax (as do most other datbases that are not mySQL). – HLGEM Nov 14 '11 at 18:44
3

In general, you cannot order by columns that are not in the GROUP BY clause and that are not aggregate functions in the SELECT list because the database has no way to deterministically sort the results. The database doesn't generally know how such a column would be aggregated in the final results so it doesn't know how to handle the case where one row in a result is, for example, the result of aggregating rows in the base table with values of 1 & 4 and another row in the result is the result of aggregating rows in the base table with values of 2 & 3. Either way it chooses to sort the results could be considered incorrect.

Now, in this particular case, since you're doing inner joins on ID between all three tables, the database could, in theory, be smart enough to recognize that ORDER BY t2.id, t1.id is semantically equivalent to ORDER BY t3.id, t3.id which can be evaluated deterministically. I don't know of any database, however, that has built this sort of query transformation into its optimizer. And the trade-off of potential bugs introduced when such an analysis goes wrong would tend to argue against including it when the queries you're trying to run don't make much logical sense from a set theory standpoint.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
1

You can order only by columns that are in the select clause - so an order by t3.ud, t1.reg_dtm should do it and would have the same semantic.

miherrma
  • 336
  • 1
  • 5
1

Actually, if you read the SQL carefully, in the first statement you group by T3.ID, but sort by T2.ID.

If that's the real SQL that you're running, then your only problem is that you have a typo.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160