sum() does not work properly in MySQL query with multiple joins (group by does not work as expected)

Question

I have tables Orders, Networks and Users and I need to get total orders count and total sum for orders for every user, and users count in the same network that every user have.

SQLFiddle with sample data: http://sqlfiddle.com/#!9/dcbeea/1

users.userid, orders.id - unique primary keys.

In this example check user #24 row: values for total_orders, total_revenue, network_users are not correct.

Current results for user #24: total_orders: 6, total_revenue: 350, network_users: 6.

Expected results for user #24: total_orders: 3, total_revenue: 175, network_users: 2.

This is SQL request:

SELECT u.*,
   count(o.id) AS total_orders,
   sum(o.total) AS total_revenue,
   count(un.userid) as network_users /* Users count in same network */
FROM users u
LEFT JOIN orders o ON o.userid=u.userid
LEFT JOIN users am ON u.ownerid = am.userid
LEFT JOIN users bdr ON u.bdrid = bdr.userid
LEFT JOIN networks n ON u.networkid = n.networkid
LEFT JOIN users un ON n.networkid=un.networkid
GROUP BY u.userid
ORDER BY u.userid DESC;

ISSUE 1: total_orders and total_revenue here return incorrect values (more than it should, looks like it summed few times because of networks table join).

I can fix total_orders by adding distinct - count(distinct(o.id)) AS total_orders, however this does not work for sum because I can't set to sum total only by disctinct ID's, as I see there is no way to set this in SQL.

You can see issue in SQLFiddle example - user #24 should have total_revenue = 175, however you see it calculated as 350. As I see this happens because two different users associated with the same network (Network #1) that user #24 have.

ISSUE 2: count(un.userid) as network_users - this does not work properly if I don't add count(disctinct(un.userid)) as network_users. Without 'distinct' this show me overall networks count as I see (and not a overall users count in network with the same networkid as current user have). In SQL example 'network_users' for user id #24 should be 2 (because only 2 users in this network), however I see 6 in results.

QUESTION: How to change SQL request to get correct expected mathematically results?

Please just give your DDL including constraints, MySQL version & the rest of a [mre]. PS This seems likely to be a common error where people want some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, but they erroneously try to do all the joining then all the aggregating or to aggregate over previous aggregations. [Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs](https://stackoverflow.com/a/45252750/3404097) — philipxy, Nov 12 '19 at 08:51
Please put everything needed for your question in the post, as text when possible, plus a link to fiddle. A post should be self-contained. Also--I just asked this--what is the first point where a subexpression doesn't return what you expect? If it's the SELECT then show the output (SELECT *) from all the joins & explain how it is that you think those SUMs are going to return what you want. (They don't. So why are you summing on that join?) What actual join for each total is actually is the right input for it? Why don't you calculate those? PS Please ask 1 specific clear question. — philipxy, Nov 13 '19 at 01:29
Put all SQL code needed to run an example in your post. That's a SELECT statement plus declarations plus initializations--everything. Plus a link to a fiddle to show that you ran it. [ask] [help] PS Your description of what your code is supposed to do--part of a [mre]--is not clear. Nobody could write code that they are sure is what you need. We can only guess. (Details re wrong output are not helpful.) PS You seem to rely on PK/UNIQUE/FK constraints that you do not declare or mention. PS Suggest you write a separate subquery for each aggregation & join on a common PK/UNIQUE. Good Luck. — philipxy, Nov 13 '19 at 22:05

score 2 · Accepted Answer · answered Nov 14 '19 at 18:46

One way: Count the network_users in a subselect (subquery in the SELECT clause)

SELECT u.userid, 
   count(o.id) AS total_orders,
   coalesce(sum(o.total), 0) AS total_revenue,
   (
     SELECT count(*)
     FROM users un     
     WHERE un.networkid = u.networkid
   ) as network_users
FROM users u
LEFT JOIN orders o ON o.userid=u.userid
GROUP BY u.userid, u.networkid
ORDER BY u.userid DESC;

Result:

| userid | total_orders | total_revenue | network_users |
| ------ | ------------ | ------------- | ------------- |
| 40     | 1            | 75            | 1             |
| 37     | 0            | 0             | 2             |
| 33     | 0            | 0             | 1             |
| 24     | 3            | 175           | 2             |

View on DB Fiddle

Another way: Do "preaggregation" in a derived table (subquery in the FROM clause)

SELECT u.userid,
  count(o.id) AS total_orders,
  coalesce(sum(o.total), 0) AS total_revenue,
  u.network_users
FROM (
  SELECT u.userid, count(un.userid) as network_users
  FROM users u
  LEFT JOIN users un ON un.networkid = u.networkid
  GROUP BY u.userid
) u
LEFT JOIN orders o ON o.userid=u.userid
GROUP BY u.userid
ORDER BY u.userid DESC;

Result:

| userid | network_users | total_orders | total_revenue |
| ------ | ------------- | ------------ | ------------- |
| 40     | 1             | 1            | 75            |
| 37     | 2             | 0            | 0             |
| 33     | 1             | 0            | 0             |
| 24     | 2             | 3            | 175           |

View on DB Fiddle

Some notes:

Make sure userid is UNIQUE or PRIMARY KEY in the users table. Otherwise you can get an error starting from MySQL 5.7.
I removed the JOINs with users am and users bdr. You are not using them in your query. You can put them back, if you want to select anything from them.
I also removed the JOIN with the networks table. You can just join the users table with itself using the networkid column.
Use COALESCE() for SUM() to convert NULL to 0.

Why your query failed to return the expected result? Because you are joining orders from a user with all other users from the same network. Thus the order count and the total amount are multiplied by the number of users in the same network.

Thanks, good answer. What a reason to use COALESCE()? Order totals for users in table can't be NULL (it's always 0 or more). And I think even if it will be NULL - SUM() should correctly counts for NULL's? (for exam if it will do 5+0+3+NULL+2 - NULL will not be considered as 0 in SUM() automatically in MySQL? Or this will give some error?). — Dmitry, Nov 14 '19 at 19:26
Remove it, and you will see. What you assume is correct for `COUNT()` but not for `SUM()`. If there are no values for `SUM()` at all (user has no orders), the it will return `NULL`. — Paul Spiegel, Nov 14 '19 at 19:29
"you are joining orders from a user with all other users from the same network" - but I am using "u" shortname for one join and "un" shortname for another, so users tables should be joined independently, or this is not how JOIN works in MySQL? If I will join the same table (no matter of short names) few times - it will be merged with every join for same table? So I should avoid using JOIN for the same table name in one query and should always divide this to subqueries? — Dmitry, Nov 14 '19 at 19:30
Joining a table with itself is not the problem. The problem is joining two independent one-to-many relations in the same FROM clause. — Paul Spiegel, Nov 14 '19 at 19:35
Regarding your first comment - see the [documentation](https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html#function_sum): "If the return set has no rows, SUM() returns NULL" and "If there are no matching rows, SUM() returns NULL." - `SUM()` doestn't convert NULLs to `0`, it's just ignoring them. That's why `SUM(5+0+3+NULL+2)` works - It's equivalent to `SUM(5+0+3+2)`. — Paul Spiegel, Nov 14 '19 at 19:48

Rick James · Answer 2 · 2019-11-15T18:06:45.117

1

Look at it this way. When doing a JOIN, first all combinations of the rows from the tables are put together into a big temp table (after filtering out any that don't apply).

Then aggregates such as COUNT() and SUM() are computed against this big table.

Usually this is not correct. And usually the cure is to first devise a query that uses the minimum number of tables needed to get the correct sum from the intermediate table. Then do any more JOINs you may need.

An alternative (sometimes) is to employ subqueries, either to do the aggregation or to provide the equivalent of JOINs.

Sometimes the query will look like this:

SELECT ...
    FROM ( SELECT key, COUNT(*), SUM(..) FROM .. GROUP BY .. ) AS a
    JOIN b  ON ...
    JOIN c  ON ...

edited Nov 15 '19 at 18:06

answered Nov 15 '19 at 04:04

Rick James

135,179
13
127
222

" And usually the cure is to first devise a query that uses the minimum number of tables needed to get the correct sum from the intermediate table. " - So you suggest to divide this SQL to two separate SQL requests? One to get count() and sum() and other to get all other staff with different joins? – Dmitry Nov 15 '19 at 06:59
@Dmitry - Usually no need to have a separate SQL. I added an example of a "derived table". Arth has an example, too. – Rick James Nov 15 '19 at 18:07

score 1 · Answer 3 · answered Nov 15 '19 at 11:09

Adding a third option to @Paul's excellent answer

You could do your counts separately, put them together using UNION ALL, and then SUM these rows

  SELECT agg.userid,
         SUM(agg.order_count) AS total_orders,
         SUM(agg.revenue_sum) AS total_revenue,
         SUM(agg.network_user_count) AS network_users

    FROM (

       /** Orders and Revenue */
    SELECT u.userid,
           COUNT(o.id) AS order_count,
           SUM(o.total) AS revenue_sum,
           0 AS network_user_count
      FROM users u
 LEFT JOIN orders o ON o.userid=u.userid
  GROUP BY u.userid 

     UNION ALL

       /** Network Users */
    SELECT u.userid, 
           0 AS order_count,
           0 AS revenue_sum,
           COUNT(un.userid) AS network_user_count
      FROM users u
 LEFT JOIN users un 
        ON un.networkid = u.networkid
  GROUP BY u.userid

         ) agg 

GROUP BY agg.userid
ORDER BY agg.userid DESC;

The inner query agg will give results like

| userid | order_count | revenue_sum | network_user_count |
| ------ | ----------- | ----------- | ------------------ |
| …
| 24     | 3           | 175         | 0                  |
| 24     | 0           | 0           | 2                  |
| …

And the outer query will then combine these rows with SUMs

It's slightly more long winded, but I have used this method in some of our projects

Interesting idea, thanks Interesting what of 3 methods works more fast (by performance) on large databases.. — Dmitry, Nov 17 '19 at 06:56
@Dmitry You are welcome! I honestly don't know, I think the 3 methods are comparable in terms of work to do, so it will be up to how they interact with any indexes you have in place.. I suggest experimenting with the different options and EXPLAIN SELECT — Arth, Nov 18 '19 at 09:38

sum() does not work properly in MySQL query with multiple joins (group by does not work as expected)

3 Answers3