SQL conditional grouping and summing

Question

My data looks like this:

|cat |subcat |amount|
---------------------
|A   |1      |123   |
|A   |2      |456   |
|B   |1      |222   |
|B   |2      |333   |

In the first case, I need to sum by cat and subcat. Easy:

SELECT cat, subcat, sum(amount) FROM data GROUP BY cat, subcat

Next, I have a more sophisticated requirement where for certain cats, the amount should be "pushed" into a given subcat. This can be stored in another config table:

|cat |subcat|
-------------
|B   |1     |

This tells me that for all cat='B' rows, the amount should be treated as a subcat=1. Furthermore, where cat='B' AND subcat <> 1 the amount should be reported as zero. In other words, the result I need is:

|cat |subcat|amount|
|A   |1     |123   |
|A   |2     |456   |
|B   |1     |555   |
|B   |2     |0     |

I cannot update my data table. Of course I can SELECT ... INTO in a proc and fix the data there, but I'm wondering if it can be done in one hit.

I can get pretty close with:

SELECT data.cat,
    ISNULL(config.subcat, data.subcat),
    SUM(amount)
FROM data
    LEFT OUTER JOIN config ON (data.cat = config.cat)
GROUP BY data.cat, ISNULL(config.subcat, data.subcat)

...but fails my second requirement to show cat:B, subcat:2 as zero.

Is it possible?

I'm using Sybase IQ 12.5 (i.e. old T-SQL, but is has the case statement, which I suspect might be useful)

I don't quite follow this part: "This tells me that for all cat='B' rows, the amount should be treated as a subcat=1. Furthermore, where cat='B' AND subcat <> 1 the amount should be reported as zero." From the first sentence, [B, 2] should be treated as an amount of 222 rather than 333, but the second sentence, [B, 2] should be treated as 0 rather than 333. Can you show a table of how the values should be treated when the rule is applied? — Glenn, Dec 06 '11 at 01:52

score 1 · Answer 1 · answered Dec 06 '11 at 03:26

Here's what I came up with.

SELECT cat, subcat, sum(amount)
FROM
(
    SELECT d.cat,
        d.subcat,
        CASE WHEN c.subcat <> d.subcat THEN 0 ELSE amount END amount
    FROM data d
        LEFT OUTER JOIN config c ON (d.cat = c.cat)
    UNION    
    SELECT d.cat,
        ISNULL(c.subcat, d.subcat),
        amount
    FROM data d
        LEFT OUTER JOIN config c ON (d.cat = c.cat)
    WHERE c.subcat <> d.subcat
) AS data2
GROUP BY cat, subcat

Given that it uses a derived table with a union, and that my actual data set is much larger than the one I gave in the question, I think a SELECT ... INTO followed by an update might actually be the more performant approach!

score 1 · Answer 2 · edited Jan 18 '21 at 12:32

You'll need to a join Data -> Config -> Data to translate the B2 to B1 then UNION that to a SELECT with a Case statement and then the SUM and GROUP BY is easy

SELECT
   t.CAT, 
   t.SUBCAT, 
   SUM(t.AMOUNT) AMOUNT
FROM
(
SELECT d.cat, 
       d.subcat, 
       CASE 
         WHEN c.subcat IS NULL 
               OR c.subcat = d.subcat THEN d.amount 
         ELSE 0 
       END AS amount 
FROM   data d 
       LEFT JOIN config c 
         ON d.cat = c.cat 
         
UNION ALL 

SELECT d.cat, 
       d.subcat, 
       d2.amount 
FROM   data d 
       INNER JOIN config c 
         ON ( d.cat = c.cat ) 
       INNER JOIN data d2 
         ON c.cat = d2.cat 
            AND c.subcat <> d2.subcat 
            AND c.subcat = d.subcat 
) t
GROUP BY
    cat,
    subcat
ORDER BY
    cat,
    subcat

You can see a working example at this data.se query.

Note I added a third "B" value to test where there more than one rolled up SubCat

Another approach that uses the WITH and ROLLUP clauses (which are supported in some versions of Sybase I don't know which)

with g as ( 
    SELECT 
   
        d.cat, 
        d.subcat,
        c.subcat config_subcat, 
        sum(amount) amount,
        GROUPING(c.subcat) subcatgroup
    FROM   data d 
    LEFT JOIN config c
    ON d.cat = c.cat 
    
   GROUP BY
      d.cat, 
      d.subcat,
      c.subcat with rollup
)

SELECT
   g.cat, 
   g.subcat,
   case when g.config_subcat is null then g.amount 
     WHEN g.subcat = g.config_subcat THEN g2.amount 
     ELSE 0 end amount
FROM g 

     LEFT JOIN g g2
     ON g.cat = g2.cat and g2.subcatgroup= 1
     and g.subcat is not null and g2.subcat is null

WHERE g.subcatgroup= 0

Which can be viewed at this data.se query

What if `config` contains a subcategory not found among those in `data` for a particular category? Your solutions do not seem to account for that. — Andriy M, Dec 06 '11 at 09:48
No it doesn't was that a requirement? I guess that could be fixed with a full outer join plus some coalsces — Conrad Frix, Dec 06 '11 at 14:55
No, the OP says nothing about that, so, of course, it's all right to assume that the situation is either impossible or being accounted for elsewhere. And other than that your solutions work correctly. — Andriy M, Dec 06 '11 at 17:11

score 0 · Answer 3 · edited Jan 18 '21 at 12:30

0

I'm a little confused by the requirements, but I think this is what you want.

SELECT d.cat,
       d.subcat, 
       SUM(CASE 
           WHEN c.subcat IS NULL OR c.subcat = d.subcat 
           THEN d.amount 
           ELSE 0 
        END) as Amount
FROM @Data d
    LEFT OUTER JOIN @Config c ON (d.cat = c.cat)
GROUP BY d.cat, d.subcat
ORDER BY d.cat

Example here - https://data.stackexchange.com/stackoverflow/q/120507/

Let me know if that's not what you were going for.

edited Jan 18 '21 at 12:30

Community

1
1

answered Dec 06 '11 at 02:24

Jeremy Wiggins

7,239
6
41
56

The result of this has row `B|2|0` (good) but has `B|1|222` whereas I want `B|1|555` – Robert Brown Dec 06 '11 at 02:42

walterhuang · Answer 4 · 2011-12-07T01:15:54.707

I am using tsql and here is my code. It's ugly but works. Actually, I like your pretty-close approach (if you don't insist showing B2 = 0).

SELECT A.cat,
       A.subcat,
       CASE WHEN B.IsConfig = 0 THEN A.amount
            WHEN B.IsConfig = 1 AND C.cat IS NULL THEN 0
            ELSE B.amount 
       END AS amount
FROM data A
INNER JOIN 
(
    SELECT B1.cat, B1.amount, CASE WHEN C1.cat IS NULL THEN 0 ELSE 1 END AS IsConfig
    FROM
    (
        SELECT cat, SUM(amount) amount
        FROM data
        GROUP BY cat
    ) B1 LEFT OUTER JOIN config C1 ON B1.cat = C1.cat
) B ON A.cat = B.cat
LEFT OUTER JOIN config C ON A.cat = C.cat AND A.subcat = C.subcat

--- I can't comment on others so I add my question here ---

Compared my code with others using Execution Plan, my query cost is 46%. Does that mean it's more efficient? Or it just depends :)

score 0 · Answer 5 · answered Dec 06 '11 at 05:10

Compute SUM(amount) for all "cat"s referenced in "config" in a derived table, then match that with your "data" table entries as appropriate:

   SELECT data.cat,
          data.subcat,
          CASE 
            WHEN dt.subcat IS NULL       -- no "config" entry for cat
              THEN data.amount
            WHEN dt.subcat = data.subcat -- "config" for cat and subcat
              THEN dt.total
            ELSE 0                       -- "config" for cat not subcat
          END AS amount
     FROM data
LEFT JOIN (  SELECT config.cat,
                    config.subcat,
                    SUM(data.amount) AS total
               FROM config
               JOIN data USING (cat)
           GROUP BY 1, 2 ) dt
           USING (cat);

+-----+--------+--------+
| cat | subcat | amount |
+-----+--------+--------+
| A   |      1 |    123 |
| A   |      2 |    456 |
| B   |      1 |    555 |
| B   |      2 |      0 |
+-----+--------+--------+
4 rows in set (0.00 sec)

score 0 · Answer 6 · answered Dec 06 '11 at 09:46

This is a bit similar to your solution, but UNION is only used to build a list of categories and subcategories. The list is then joined with another derived table, which is essentially the same as the right part of your UNION. Here goes:

SELECT s.cat, s.subcat, ISNULL(SUM(d.amount), 0)
FROM (
  SELECT cat, subcat FROM data
  UNION
  SELECT cat, subcat FROM config
) s
  LEFT JOIN (
    SELECT
      d.cat,
      subcat = ISNULL(c.subcat, d.subcat),
      d.amount
    FROM data d
      LEFT JOIN config c ON d.cat = c.cat
  ) d ON s.cat = d.cat AND s.subcat = d.subcat
GROUP BY s.cat, s.subcat

SQL conditional grouping and summing

6 Answers6