SQL How to use SUM and Group By with multiple tables?

Question

I have three tables.

For each "id" value, I would like the sum of the col1 values, the sum of col2 values & the sum of col3 values listed separately. I am not summing across tables.

table a
num  | id  |  col1
================
1     100     0
2     100     1
3     100     0

1     101     1
2     101     1
3     101     0

table b
idx  | id  | col2
=================
1     100    20
2     100    20
3     100    20

4     101   100
5     101   100

table c
idx | id | col3
==============================
1     100     1 
2     100     1
3     100     1

4     101    10
5     101     1

I would like the results to look like this,

ID | sum_col1 | sum_col2 | sum_col3
====================================
100      1         60          3    
101      2        200         11

Here is my query which runs too long and then times out. My tables are about 25,000 rows.

SELECT a.id as id,
SUM(a.col1) as sum_col1,
SUM(b.col2) as sum_col2,
SUM(c.col3) as sum_col3 
FROM a, b, c
WHERE a.id=b.id
AND a=id=c.id
GROUP by id 
Order by id desc

The number of rows in each table may be different, but the range of "id" values in each table is the same.

This appears to be a similar question, but I can't make it work,

Mysql join two tables sum, where and group by

Try `Group By a.id` and `Order by a.id desc`. You should move away from the old style joins and use the `JOIN` — SS_DBA, Nov 21 '17 at 16:13
Do all tables always have all the ID numbers in? Or is there a case where table a has ID 107 but this is not present in b or c, and table b has 199 but this is not present in a or c, etc — Caius Jard, Nov 21 '17 at 16:16
Value of column "id" is not unique in your example.Using a non-unique key to join the data will produce Cartesian product, which is the likely cause of the performance issue. In the join use the a.num = b.idx and b.idx = c.idx. Other option is to do the SUM of each table separately in subqueries and join the results. — Ramesh, Nov 21 '17 at 16:16
Yes - all tables have all of the ID numbers, but not necessarily the same quantity of rows. — zuma, Nov 21 '17 at 16:29
Then a join solution will work.. until it doesn't (whereas a union solution will always work) — Caius Jard, Nov 21 '17 at 16:36

Ramesh · Accepted Answer · 2017-11-21T16:48:00.270

Here is a solution based on your data. Issue with your query is that you were joining tables on a non-unique column resulting in Cartesian product.

Data

DROP TABLE IF EXISTS A;

CREATE TABLE A
(num int,
 id int,
 col1 int);

 INSERT INTO A VALUES (1,     100,     0);
 INSERT INTO A VALUES (2,     100,     1);
 INSERT INTO A VALUES (3,     100,     0);
 INSERT INTO A VALUES (1,     101,     1);
 INSERT INTO A VALUES (2,     101,     1);
 INSERT INTO A VALUES (3 ,    101,     0);

DROP TABLE IF EXISTS B;

 CREATE TABLE B
(idx int,
 id int,
 col2 int);

 INSERT INTO B VALUES (1,     100,     20);
 INSERT INTO B VALUES (2,     100,     20);
 INSERT INTO B VALUES (3,     100,     20);
 INSERT INTO B VALUES (4,     101,     100);
 INSERT INTO B VALUES (5,     101,     100);

DROP TABLE IF EXISTS C;

CREATE TABLE C
(idx int,
 id int,
 col3 int);

 INSERT INTO C VALUES (1,     100,     1);
 INSERT INTO C VALUES (2,     100,     1);
 INSERT INTO C VALUES (3,     100,     1);
 INSERT INTO C VALUES (4,     101,     10);
 INSERT INTO C VALUES (5,     101,     1);

Solution

SELECT a_sum.id, col1_sum, col2_sum, col3_sum
 FROM (SELECT id, SUM(col1) AS col1_sum
         FROM a
        GROUP BY id ) a_sum
      JOIN 
      (SELECT id, SUM(col2) AS col2_sum
         FROM b
        GROUP BY id ) b_sum 
      ON (a_sum.id = b_sum.id)
      JOIN 
      (SELECT id, SUM(col3) AS col3_sum
         FROM c
        GROUP BY id ) c_sum 
      ON (a_sum.id = c_sum.id);

Result is as expected

Note: Do outer joins if an id doesnt have to be present in all three tables.

Thank you!! This works and it is fast. I even understand it. — zuma, Nov 21 '17 at 16:58

backbone · Answer 2 · 2017-11-21T16:39:12.853

1

Maybe this will do?

Haven't got a chance to run it, but i think it can do the job.

SELECT sumA.id, sumA.sumCol1, sumB.sumCol2, sumC.sumCol3
FROM
(SELECT id, SUM(col1) AS sumCol1 FROM a GROUP BY id ORDER BY id ASC) AS sumA
JOIN (SELECT id, SUM(col2) AS sumCol2 FROM b GROUP BY id ORDER BY id ASC) AS sumB ON sumB.id = sumA.id
JOIN (SELECT id, SUM(col3) AS sumCol3 FROM c GROUP BY id ORDER BY id ASC) AS sumC ON sumC.id = sumB.id
;

EDIT

SELECT IF(sumA.id IS NOT NULL, sumA.id, IF(sumB.id IS NOT NULL, sumB.id, IF(sumC.id IS NOT NULL, sumC.id,''))),,
sumA.sumCol1, sumB.sumCol2, sumC.sumCol3
FROM
(SELECT id, SUM(col1) AS sumCol1 FROM a GROUP BY id ORDER BY id ASC) AS sumA
OUTER JOIN (SELECT id, SUM(col2) AS sumCol2 FROM b GROUP BY id ORDER BY id ASC) AS sumB ON sumB.id = sumA.id
OUTER JOIN (SELECT id, SUM(col3) AS sumCol3 FROM c GROUP BY id ORDER BY id ASC) AS sumC ON sumC.id = sumB.id
;

edited Nov 21 '17 at 16:39

answered Nov 21 '17 at 16:20

backbone

98
1
1
8

Note, that this only works if every table has a value for every id.. If suddenly table a has a value for id 199, and b has 199 but c does not, then 199 will disappear from the result set – Caius Jard Nov 21 '17 at 16:26
Yes that's why you need to do left join – Daniel E. Nov 21 '17 at 16:26
No, because then if 199 is in b and c but not a, it disappears (a left join b left join c) - really you need full outer join to cater for any table not having a value, and then you need ugly join conditions like a.id =coalesce(b.id,c.id) – Caius Jard Nov 21 '17 at 16:27
Truly, union is the most tolerant way to do this, and it's not an automatic performance hit; the database reads 75,000 rows, and it doesnt matter if it dumps them into 1 hash table or 3.. It's not an "automatic performance hit to use union" – Caius Jard Nov 21 '17 at 16:30
i will change my JOIN to OUTER JOIN. which is what i meant. like in this answer here: [link](https://stackoverflow.com/a/6188334/3412965) – backbone Nov 21 '17 at 16:30
The join solution worked. The outer join solution error'd near the first outer join statement. Also - is there an extra comma after the IF statement? I tried it both ways. – zuma Nov 21 '17 at 17:38

Caius Jard · Answer 3 · 2017-11-22T15:44:14.767

I would do the summing first, then union the results, then pivot them round:

SELECT
  id,
  MAX(CASE WHEN which = 'a' then sumof end) as sum_a,
  MAX(CASE WHEN which = 'b' then sumof end) as sum_b,
  MAX(CASE WHEN which = 'c' then sumof end) as sum_c
FROM
(
  SELECT id, sum(col1) as sumof, 'a' as which FROM a GROUP BY id
  UNION ALL
  SELECT id, sum(col2) as sumof, 'b' as which FROM b GROUP BY id
  UNION ALL
  SELECT id, sum(col3) as sumof, 'c' as which FROM c GROUP BY id
) a
GROUP BY id

You could also union, then sum:

SELECT
  id,
  SUM(CASE WHEN which = 'a' then v end) as sum_a,
  SUM(CASE WHEN which = 'b' then v end) as sum_b,
  SUM(CASE WHEN which = 'c' then v end) as sum_c
FROM
(
  SELECT id, col1 as v, 'a' as which FROM a GROUP BY id
  UNION ALL
  SELECT id, col2 as v, 'b' as which FROM b GROUP BY id
  UNION ALL
  SELECT id, col3 as v, 'c' as which FROM c GROUP BY id
) a
GROUP BY id

You cant easily use a join, unless all tables have all values of ID, in which case I'd say you can sum them as subqueries and then join the results together.. But if one of your tables suddenly lacks an id value that the other two tables have, that row disappears from your results (unless you use full outer join and some really ugly coalescing in your ON clause)

Using union in this case will give you a more missing-value-tolerant result set, as it can cope with missing values of ID in any table. Thus, we union the tables together into one dataset, but use a constant to track which table the value came from, that way we can pick it out into its own summation later

If any id value is not present in any table, then the sum for that column will be null. If you want it to be 0, you can change the MAX to SUM or wrap the MAX in a COALESCE

You Really think that 3 union is a good way to improve the performance ? — Daniel E., Nov 21 '17 at 16:23
It makes little difference; the query will have to access all the data in all the tables in order to compute the sum. Whether this is union or join is irrelevant — Caius Jard, Nov 21 '17 at 16:25
I tried this, but got "Every derived table must have its own alias" errors. — zuma, Nov 21 '17 at 18:11
Put a letter `a` here: `) GROUP BY id` -> `) a GROUP BY id`. It can be any character actually; all mysql is asking you to do is give the sub query (in the brackets) a name (some bunch of characters immediately following the last bracket). Apologies for the typo; i'll fix the post — Caius Jard, Nov 22 '17 at 15:44

SQL How to use SUM and Group By with multiple tables?

3 Answers3