2

There are many grouping sets examples on the internet like query Q1 in the example below. But query Q2 is different because A2 is a grouping column and it is used as the argument to SUM().

Which one of the following is correct for Q2 according to the SQL Standard (any version since 2003 that supports grouping sets)? If (1) is correct, please explain why with reference to the Standard.

  1. A2 is replaced by NULL unless it is in an argument to an aggregate. This interpretation would give results R1 below. This is Oracle's behaviour (which seems more useful).

  2. A2 is replaced by NULL including where it is used in an aggregate: this means that the aggregate will return NULL. This interpretation would give results R2 below. This is how I have understood the SQL Standard (possibly incorrectly).

Example code:

-- Setup
create table A (A1 int, A2 int, A3 int);
insert into A values (1, 1, 100);
insert into A values (1, 2, 40);
insert into A values (2, 1, 70);
insert into A values (5, 1, 90);

-- Query Q1
-- Expected/Observed results:
--
--         A1         A2    SUM(A3)
-- ---------- ---------- ----------
--          1          -        140
--          2          -         70
--          5          -         90
--          -          1        260
--          -          2         40
--          -          -        300
select A1, A2, sum (A3)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;

-- Query Q2
-- Results R1 (Oracle):
--         A1         A2    SUM(A2)
-- ---------- ---------- ----------
--          1          -          3
--          2          -          1
--          5          -          1
--          -          1          3
--          -          2          2
--          -          -          5
-- 
-- Results R2 (SQL Standard?):
--         A1         A2    SUM(A2)
-- ---------- ---------- ----------
--          1          -          - 
--          2          -          - 
--          5          -          - 
--          -          1          3
--          -          2          2
--          -          -          -   -- NULL row
select A1, A2, sum (A2)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;

I am aware of this from SQL 2003 7.9 Syntax 17, which describes how columns are replaced with NULLs. However, I might have missed or misunderstood a rule elsewhere that excludes arguments to aggregates.

m) For each GS_i:
   iii) Case:
        1) If GS_i is an <ordinary grouping set>, then
           A) Transform SL2 to obtain SL3, and transform HC to obtain
              HC3, as follows:
              II) Replace each <column reference> in SL2 and HC that
                  references PC_k by "CAST(NULL AS DTPCk)"
Ben C
  • 658
  • 6
  • 18
  • I dont see the question. – Juan Carlos Oropeza Oct 29 '15 at 13:47
  • @JuanCarlosOropeza My question is at the end - "Have I missed something in the SQL Standard? Or is Oracle non-standard here?". I'll think about rewording a bit... – Ben C Oct 29 '15 at 13:55
  • I think you wrote a lot, but didnt point to the issue you are having problem. Seem to be somekind of difference, but dont know which one you consider wrong. – Juan Carlos Oropeza Oct 29 '15 at 13:58
  • My assumption is that the replacement of `PC_k` with `CAST(NULL AS DTPCk)` does not apply when `PC_k` is contained within an aggregate. The results you are getting with Oracle are the same as the results with SQL Server, and the logical results as replacing values within the aggregate changes the meaning of the aggregate. I have however, (painfully) read through the relevant section of the standard and can't find an explicit reference to this rule, I suspect it is referenced somewhere, but I have lost patience with reading the standard now... – GarethD Oct 29 '15 at 14:04

1 Answers1

0

As with many difficult SQL features, it can help to look at earlier versions of the standard where the phrasing might be simpler. And it turns out that grouping sets were introduced in SQL 1999 and were then revised in SQL 2003.

SQL 1999

Syntax Rule 4 states:

Let SING be the <select list> constructed by removing from SL every <select
sublist> that is not a <derived column> that contains at least one <set
function specification>.

Then Syntax Rule 11 defines PC_k as the column references contained in the group by list. It constructs a derived table projecting the union of GSQQL_i, which are query specifications projecting the PC_k or NULL as appropriate, the PCBIT_i grouping function indicators and SING.

Thus any that contains a set function will not have its argument replaced, and its columns won't be replaced either. So answer (1) is correct.

However, in the following query the GSQQL_i corresponding to the <grand total> doesn't group by C1 so I think it will give an error rather than replacing C1 with NULL for that grouping set.

select C1 + MAX(C2) from T group by grouping sets ((C1), ());

SQL 2003 - 2011

I still don't have a definitive answer for this. It hinges on what they meant (or forgot to specify?) by "references" in the replacement rule. It would be clearer if it said one of "immediately contained", "simply contained" or "directly contained", as defined in ISO 9075-1 (SQL Part 1: Framework).

The note (number 134 in SQL 2003) at the start of General Rules says "As a result of the syntactic transformations specified in the Syntax Rules of this Sub-clause, only primitive <group by clause>s are left to consider." So the aggregate argument either has or has not actually been replaced: we aren't expected to evaluate aggregates in a special way (whereas if General Rule 3 were in effect applied before the NULL substitution of Syntax Rule 17 then answer (1) would be correct).

I found a draft of Technical Corrigendum 5 [pdf], which is a "diff" towards SQL 2003. This includes the relevant changes to on pages 80-87. Unfortunately the bulk of the change has only the brief rationale "Provide a correct, unified treatment of CUBE and ROLLUP". General Rule 3, quoted above, has the rationale "clarify the semantics of column references".

Ben C
  • 658
  • 6
  • 18