7

I'm trying to refine a SQL query to make my reports looks better. My query reads data from one table, groups by a few colums and calculates some aggregate fields (counts and sums).

SELECT A, B, C, COUNT(*), SUM(D) FROM T
GROUP BY A, B, C
ORDER BY A, B, C

Now, let's assume B and C columns are some defined constant strings, for example, B can be 'B1' or 'B2', C can be 'C1' or 'C2'. So, an example resultset is:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A1 | B1 | C1 |       34 |   1752
A1 | B1 | C2 |        4 |    183
A1 | B2 | C1 |      199 |   8926
A1 | B2 | C2 |       56 |   2511
A2 | B1 | C2 |        6 |     89
A2 | B2 | C2 |       12 |    231
A3 | B1 | C1 |       89 |    552
...

As you can see, for 'A1' I have all four possible (B, C) combination, but that's not true for 'A2'. My question is: how can I generate also summary rows for (B, C) combination not present, in fact, in the given table? That is, how can I print, for example, also these rows:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A2 | B1 | C1 |        0 |      0
A2 | B2 | C1 |        0 |      0

The only solution I can see is to create some auxiliarity tables with all (B, C) values and then make a RIGHT OUTER JOIN with that aux table. But I'm searching for a cleaner way...

Thank you all.

caitriona
  • 8,569
  • 4
  • 32
  • 36
lorenzo-s
  • 16,603
  • 15
  • 54
  • 86
  • What are you using mssql, oracel, mysql? – Arion May 10 '12 at 07:43
  • Oracle. But the query is written, at the moment, in a standard way (it runs on MySQL too, we have a MySQL copy of the DB for no question-relevant reason). – lorenzo-s May 10 '12 at 07:44
  • Maybe [this](http://www.techrepublic.com/article/group-by-grouping-sets-for-custom-rollups-in-oracle/6134424) helps. Grouping sets are available in both SQL Server and Oracle. Not sure about MySQL. – dan radu May 10 '12 at 07:50
  • @dradu Very interesting. I've never hear of CUBE and GROUPING SETS before. Maybe bacuse I'm mostly a MySQL user. Anyway, it does not help: both CUBE and GROUPING SETS produce any kind of grand-total, but not generate empty summaries. – lorenzo-s May 10 '12 at 08:04
  • I have solved a similar problem using `CROSS JOIN` in a subselect, here: http://stackoverflow.com/questions/9924433/mysql-select-query-using-count – Lukas Eder May 10 '12 at 13:02
  • 1
    What is "unclean" about using a right outer join to solve the problem? That seems like the right solution. – Gordon Linoff May 10 '12 at 14:52
  • @GordonLinoff Nothing. Maybe "cleaner" is not the right word. My knowledge about Oracle DBMS is very limited, and I was curious to know if there were other solution involving less code. – lorenzo-s May 10 '12 at 15:06

3 Answers3

2

The auxiliary table doesn't have to be a real table, it can be a common table expression - at least if you can get all possible values (or all you're interested in) from the table itself. Using @Bob Jarvis' query to generate all possible combinations you can do something like:

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT DISTINCT b, c FROM T) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

If you have fixed values that may not be in the table then it's a little more complicated (or uglier anyway, and gets worse with more possible values):

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT 'B1' AS B FROM DUAL
        UNION ALL SELECT 'B2' FROM DUAL) ON (1 = 1)
    JOIN (SELECT 'C1' AS C FROM DUAL
        UNION ALL SELECT 'C2' FROM DUAL) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

But you have to join to something that knows about the 'missing' values. If the same logic is needed elsewhere, and you have fixed values, then a permanent table might be cleaner - maintenance may be needed either way of course. You could also consider a pipelined function to act as a surrogate table, but depends on volumes maybe.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • 1
    The table itself contains all the possibilities. Obviously, I was NOT expecting the DB engine will "guess" them. +1 for the interesting usage of `WITH`, I've never seen t before. – lorenzo-s May 10 '12 at 15:09
1

The thing is, if you don't have a particular combination in your database, how would an engine know to include that combination into the results? In order to have all combinations in the results, you need to have all combinations available - whether in the main table or in some other table used for referencing. For example, you can create another table R with data like so:

A  | B  | C  
------------
A1 | B1 | C1
A1 | B1 | C2
A1 | B2 | C1
A1 | B2 | C2
A2 | B1 | C1
A2 | B1 | C2
A2 | B2 | C1
A2 | B2 | C2
A3 | B1 | C1
A3 | B1 | C2
A3 | B1 | C1
A3 | B2 | C2
...

And then your query would look like this:

SELECT r.*, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM r LEFT OUTER JOIN t on (r.a=t.a and r.b=t.b and r.c=t.c)
GROUP BY r.a, r.b, r.c
ORDER BY r.a, r.b, r.c

This will return you the set as you want with 0 | 0 for combination that don't exist in the main table. Note that this is only possible if you do know every possible combination you want to include, which may not always be the case.

If on the other hand your A, B, C are numerical values and you just want to include all numbers in a range, then there may be another way of dealing with this, something like this:

SELECT a.n, b.n, c.n, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_a CONNECT BY LEVEL <= end_a) a,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_b CONNECT BY LEVEL <= end_b) b,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_c CONNECT BY LEVEL <= end_c) c,
     t
WHERE a.n = t.a(+) AND b.n = t.b(+) AND c.n = t.c(+)
GROUP BY a.n, b.n, c.n
ORDER BY a.n, b.n, c.n

(I don't have an Oracle instance handy to test this, so this is more of a somewhat educated guess rather than anything else.)

The bottom line is the engine needs to know what to include into the final results - one way or another.

Aleks G
  • 56,435
  • 29
  • 168
  • 265
  • That's how I'm doing it right now. See the last paragraph in my question. – lorenzo-s May 10 '12 at 14:16
  • @lorenzo-s The main problem is that without you specifying all valid combination, the engine has no way of knowing, therefore you do need to have this temporary table. – Aleks G May 10 '12 at 14:24
  • The table itself (considering all the rows) contains all combinations of (B, C) values. – lorenzo-s May 10 '12 at 15:10
0

There's probably prettier ways to do this, but the following should get you started towards what you want:

SELECT * FROM 
  (SELECT DISTINCT a FROM T)
JOIN
  (SELECT DISTINCT b, c FROM T)
  ON (1 = 1)
ORDER BY a, b, c

This will give you all combinations which exist of B and C, together with all A's which exist, similar to

A1  B1  C1
A1  B1  C2
A1  B2  C1
A1  B2  C2
A2  B1  C1
A2  B1  C2
A2  B2  C1
A2  B2  C2

Share and enjoy.

  • I'm searching for something (and I'm not assuming it exixts!) that does not involve a join or so... I'm currently using a table like the one you wrote to achieve the result. See the last paragraph in my question. – lorenzo-s May 10 '12 at 14:23
  • The issue here is different. If table T contains all possible combinations, then you're ok, however it it doesn't, then you're back to square one. Somehow the DB engine needs to know what all the possible combinations are. – Aleks G May 10 '12 at 14:26
  • @AleksG Ok, ok, we do not have this issue: the table itself (considering all the rows) contains all combinations of (B, C) values. Obviously, I was NOT expecting the DB engine will "guess" them. – lorenzo-s May 10 '12 at 15:11