GROUP BY + CASE statement

Question

I have a working query that is grouping data by hardware model and a result, but the problem is there are many "results". I have tried to reduce that down to "if result = 0 then keep as 0, else set it to 1". This generally works, but I end up having:

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    42
 2013-11-06 | modelA         |    1 |    1 |     2
 2013-11-06 | modelA         |    1 |    1 |    11
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    16
 2013-11-06 | modelB         |    1 |    1 |     8
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |     5
 2013-11-06 | modelB         |    3 |    1 |     7
 2013-11-06 | modelB         |    3 |    1 |   563

Instead of the aggregate I am trying to achieve, where only 1 row per type/case combo.

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    55
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    24
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |   575

Here is my query:

select CURRENT_DATE-1 AS day, model.name, attempt.type, 
       CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END, 
       count(*) 
from attempt attempt, prod_hw_id prod_hw_id, model model
where time >= '2013-11-06 00:00:00'  
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by model.name, attempt.type, attempt.result
order by model.name, attempt.type, attempt.result;

Any tips on how I can achieve this would be awesome.

Day will always be defined in the WHERE clause, so it will not vary. name, type, result(case) and count will vary. In short, for any given model I want only 1 row per "type + case" combo. As you can see in the first result set I have 3 rows for modelA that have type=1 and case=1 (because there are many "result" values that I have turned into 0=0 and anything else=1). I want that to be represented as 1 row with the count aggregated as in example data set 2.

*Which* row per type/case. Define your pick. Can day and name vary? — Erwin Brandstetter, Nov 07 '13 at 23:44
@ErwinBrandstetter, for 1 model(name), only 1 row per type/result(case) combo. there are many models. — ssbsts, Nov 07 '13 at 23:54
There will always be at least 1 row for each model, but likely more than that because of succ/fail and different type. I will add a bit more to the data set example. Perhaps that would be more helpful. I am trying to group by model, type, case; yes. Generally it works, but I just wanted to aggregate the count on (otherwise) unique rows where only the count is different. — ssbsts, Nov 08 '13 at 00:02

Erwin Brandstetter · Accepted Answer · 2022-11-13T00:47:52.550

Your query would work already - except that you are running into naming conflicts or just confusing the output column (the CASE expression) with source column result, which has different content.

...
GROUP BY model.name, attempt.type, attempt.result
...

You need to GROUP BY your CASE expression instead of your source column:

...
GROUP BY model.name, attempt.type
       , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END
...

Or provide a column alias that's different from any column name in the FROM list - or else that column takes precedence:

SELECT ...
     , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END AS result1
...
GROUP BY model.name, attempt.type, result1
...

The SQL standard is rather peculiar in this respect. Quoting the manual here:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

And:

If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

Bold emphasis mine.

These conflicts can be avoided by using positional references (ordinal numbers) in GROUP BY and ORDER BY, referencing items in the SELECT list from left to right. See solution below.
The drawback is that this may be harder to read and vulnerable to edits in the SELECT list: one might forget to adapt positional references accordingly.

But you do not have to add the column day to the GROUP BY clause, as long as it holds a constant value (CURRENT_DATE-1).

Rewritten and simplified with proper JOIN syntax and positional references it could look like this:

SELECT m.name
     , a.type
     , CASE WHEN a.result = 0 THEN 0 ELSE 1 END AS result
     , CURRENT_DATE - 1 AS day
     , count(*) AS ct
FROM   attempt    a
JOIN   prod_hw_id p USING (hard_id)
JOIN   model      m USING (model_id)
WHERE  ts >= '2013-11-06 00:00:00'  
AND    ts <  '2013-11-07 00:00:00'
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

I avoided the column name time. That's a reserved word and should not be used as identifier. Besides, your "time" obviously is a timestamp or date, so that was rather misleading.

You should probably mention the option of using positional (numeric) column references while you're talking about naming conflicts, rather than just using an example that doesn't match the prose advice to that point. I still find that syntax much less readable myself, but I know some prefer it. — IMSoP, Nov 08 '13 at 00:46
Very well explained otherwise, by the way; good work quoting the manual (and thus indirectly referencing the standard) for when and how you can use output column names. — IMSoP, Nov 08 '13 at 00:48
@IMSoP: Hmm .. but I do mention positional parameters at the end. I already put so much into this answer, so I kept it short. The reader can turn to the manual for more epic reading. The link is there .. — Erwin Brandstetter, Nov 08 '13 at 00:51
Fair enough on the length. It just felt odd after all that talk about how to name output columns to not actually use output column names at all, but a completely different solution to the problem. Perhaps just before the example say "Rewritten ... using another alternative, which is referring to output columns by their position" or something. (Also, those aren't positional *parameters* surely? Just positional *references*...) — IMSoP, Nov 08 '13 at 01:07
@IMSoP: You are right, the term is misleading. Updated accordingly. — Erwin Brandstetter, Nov 08 '13 at 01:29
@ErwinBrandstetter thanks so much for the detailed response. Works very well for my use case. — ssbsts, Nov 08 '13 at 15:38

score 18 · Answer 2 · edited Jan 05 '17 at 00:31

18

can you please try this: replace the case statement with the below one

Sum(CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END) as Count,

edited Jan 05 '17 at 00:31

RBT

24,161
21
159
240

answered Nov 03 '15 at 20:57

Malathi

181
1
2

I had no idea that you could wrap a case in sum and avoid the group by nightmare that follows complex case statements. The possibilities... – David Barnett Sep 06 '21 at 12:28

score 6 · Answer 3 · edited Apr 19 '20 at 08:32

6

Aliases can be used only if they were introduced in the preceding step. So aliases in the SELECT clause can be used in the ORDER BY but not the GROUP BY clause.

Reference: Microsoft T-SQL Documentation for further reading.

FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

Hope this helps.

edited Apr 19 '20 at 08:32

SHR

7,940
9
38
57

answered Apr 19 '20 at 07:11

vinod kumar

61
1
2

score 2 · Answer 4 · answered Nov 07 '13 at 23:44

2

Try adding the other two non COUNT columns to the GROUP BY:

select CURRENT_DATE-1 AS day, 
model.name, 
attempt.type, 
CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END, 
count(*) 
from attempt attempt, prod_hw_id prod_hw_id, model model
where time >= '2013-11-06 00:00:00'  
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by 1,2,3,4
order by model.name, attempt.type, attempt.result;

answered Nov 07 '13 at 23:44

Filipe Silva

21,189
5
53
68

I assume you mean to add "day" to the group by? I tried that and the result is the same. Did you mean something else? I have 5 fields in the select and 3 are already there, so day is the only other non-count field. – ssbsts Nov 07 '13 at 23:53
Yes. i put them on my answer: 1,2,3,4. All except the COUNT. – Filipe Silva Nov 07 '13 at 23:54

score 2 · Answer 5 · answered Mar 30 '17 at 14:40

For TSQL I like to encapsulate case statements in an outer apply. This prevents me from having to have the case statement written twice, allows reference to the case statement by alias in future joins and avoids the need for positional references.

select oa.day, 
model.name, 
attempt.type, 
oa.result
COUNT(*) MyCount 
FROM attempt attempt, prod_hw_id prod_hw_id, model model
WHERE time >= '2013-11-06 00:00:00'  
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
OUTER APPLY (
    SELECT CURRENT_DATE-1 AS day,
     CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END result
    ) oa    
group by oa.day, 
model.name, 
attempt.type, 
oa.result
order by model.name, attempt.type, oa.result;

How is TSQL be applicable to a [postgresql] question? (It isn't.) — Erwin Brandstetter, Mar 30 '17 at 20:44
Oddly enough ^^ when i googled how to avoid duplicate case statement in group by this answer helped me out (yes for tSQL :D ) — felickz, May 27 '21 at 17:20

GROUP BY + CASE statement

5 Answers5

Linked

Related