6

I have a query like this:

SELECT col1, col2, col3, col4, col5, SUM(col6) AS total
FROM table_name
WHERE col1 < 99999
GROUP BY 1,2,3,4,5

What does the GROUP BY statement actually accomplish here? The query does not work properly without the comma-separated integers.

James Skidmore
  • 49,340
  • 32
  • 108
  • 136
  • You generally aren't allowed to directly select columns which aren't in the `GROUP BY`. I'm assuming 1,2,3, etc are actually your column names? – Sam Dufel Jun 17 '11 at 00:06
  • 2
    What database product and version? It is possible that your DBMS supports a proprietary feature of referencing the columns by the position. Not recommended however. – Thomas Jun 17 '11 at 00:08
  • agree with @sam dufel if this is the exact query ... else could be column alias. Otherwise your query probably will error out saying that you must include the column in select into group by as well. – Rahul Jun 17 '11 at 00:09
  • @Thomas should write that as the answer. Several RDBMS allow reference to column _numbers_ in the `GROUP BY`. I find it useful, but because of non-portability it may not be a good idea. – Andrew Lazarus Jun 17 '11 at 00:13

3 Answers3

8

It is equivalent to writing:

SELECT col1, col2, col3, col4, col5, SUM(col6) AS total
  FROM table_name
 WHERE col1 < 99999
 GROUP BY col1, col2, col3, col4, col5

The numbers are the values/columns in the select-list expressed by ordinal position in the list, starting with 1.

The numbers used to mandatory; then the ability to use the expressions in the select-list was added. The expressions can get unwieldy, and not all DBMS allow you to use 'display labels' or 'column aliases' from the select-list in the GROUP BY clause, so occasionally using the column numbers is helpful.

In your example, it would be better to use the names - they are simple. And, in general, use names rather than numbers whenever you can.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • FYI: I'm sure only MySQL allows this nonsense currently – gbn Jun 17 '11 at 05:13
  • @gbn: which nonsense? The SQL standard still requires it, AFAIK. The SQL-86 standard, I'm almost certain, provided only the numbered column notation. Products that stretch back that far (DB2, Informix, Oracle, Sybase, Ingres, SQL Server, ...) will probably support the numbered notation. I can answer for Informix with certainty; it does allow numbered columns. Which DBMS do you know of that does not support it? – Jonathan Leffler Jun 17 '11 at 05:23
  • @gbn: Having said that, I've taken a look at the [SQL-92](http://www.savage.net.au/SQL/) grammar, and it is not clear that it allows column numbers, so maybe I'm misremembering. The notation arose from the original IBM SQL (de facto) specification, and in the early to mid 1980s, the systems followed that as the 'de facto' standard. – Jonathan Leffler Jun 17 '11 at 05:31
  • Support for ordinals in the ORDER BY is common and in the standard. Of mainstream RDBMS, only MySQL allows it in the GROUP BY. Not [SQL Server](http://stackoverflow.com/q/2253040/27535), not [Oracle](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:48704116042682#49018142476891) at least – gbn Jun 17 '11 at 05:40
  • [SQL-92 standard](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) and look for `20.2 ` and `7.7 `. Ordinals only in the ORDER BY. – gbn Jun 17 '11 at 05:42
  • Informix allows ordinals in the [GROUP BY](http://publib.boulder.ibm.com/infocenter/idshelp/v117/index.jsp?topic=/com.ibm.sqls.doc/ids_sqs_0165.htm) clause. It appears that DB2 v9.7 does not. Maybe it is less common than I thought...but the DBMS I use most does support it. – Jonathan Leffler Jun 17 '11 at 05:44
  • I did say mainstream: DB2, Oracle, SQL Server and Sybase don't. So we've covered 85% of the market :-) – gbn Jun 17 '11 at 05:49
  • @gbn is correct that the OP's code is not valid SQL-92 syntax, which infers it is not valid SQL-86 syntax either. – onedaywhen Jun 17 '11 at 08:32
3

My guess is that your database product allows for referencing columns in the Group By by position as opposed to only by column name (i.e., 1 for the first column, 2 for the second column etc.) If so, this is a proprietary feature and is not recommended because of portability and (arguably) readability issues (But can admittedly be handy for a quick and dirty query).

Thomas
  • 63,911
  • 12
  • 95
  • 141
1

Tried kind a same query in MS SQL Server 2005

select distinct host from some_table group by 1,2,3

It error's out saying

Each GROUP BY expression must contain at least one column that is not an outer reference.

So this indicates that those 1,2,3 are nothing but column outer referrence

Rahul
  • 76,197
  • 13
  • 71
  • 125