MySQL aggregate query not behaving as expected

Question

I am trying to aggregate some call records and doing something wrong - but not sure what.

I have the following query:

SELECT 
    c.name, 
    IF(MONTH(start) = 1, SUM(duration),0) AS 'Jan',
    IF(MONTH(start) = 2, SUM(duration),0) AS 'Feb',
    IF(MONTH(start) = 3, SUM(duration),0) AS 'Mar',
    IF(MONTH(start) = 4, SUM(duration),0) AS 'Apr',
    IF(MONTH(start) = 5, SUM(duration),0) AS 'Jun',
    IF(MONTH(start) = 6, SUM(duration),0) AS 'Jul',
    IF(MONTH(start) = 7, SUM(duration),0) AS 'Aug'
FROM
    call_history ch, client c
WHERE
    ch.client_reseller_id = c.id
GROUP BY ch.client_reseller_id
ORDER BY c.name;

The field start contains the date/time of a call. The table only contains records for one year, so no worries about not filtering on year.

The results I get are not as expected:

+--------------------------------+----------+------+------+--------+------+------+------+
| name                           | Jan      | Feb  | Mar  | Apr    | Jun  | Jul  | Aug  |      
|+-------------------------------+----------+------+------+--------+------+------+------+
| Come company                   |  5243080 |    0 |    0 |      0 |    0 |    0 |    0 |
| Other cust                     |  4085085 |    0 |    0 |      0 |    0 |    0 |    0 |
| Someone                        |  1449543 |    0 |    0 |      0 |    0 |    0 |    0 |
| Demo Reseller                  |     2342 |    0 |    0 |      0 |    0 |    0 |    0 |
+--------------------------------+----------+------+------+--------+------+------+------+

What am I doing wrong?

That's because your `group by` usage is invalid (standard) SQL. Every other DBMS would reject your query but MySQL chooses to return "random" (they call it indeterminate) results instead and not bother you with an error message. For details see here: http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/ and here: http://rpbouman.blogspot.de/2007/05/debunking-group-by-myths.html — , Aug 15 '14 at 11:32

score 3 · Answer 1 · answered Aug 15 '14 at 11:06

Change like this

SELECT 
    c.name, 
   SUM(IF(MONTH(start) = 1, duration,0)) AS 'Jan',
    SUM(IF(MONTH(start) = 2, duration,0)) AS 'Feb',
    SUM(IF(MONTH(start) = 3, duration,0)) AS 'Mar',
    SUM(IF(MONTH(start) = 4, duration,0)) AS 'Apr',
    SUM(IF(MONTH(start) = 5, duration,0)) AS 'Jun',
    SUM(IF(MONTH(start) = 6, duration,0)) AS 'Jul',
    SUM(IF(MONTH(start) = 7, duration,0)) AS 'Aug'
FROM
    call_history ch, client c
WHERE
    ch.client_reseller_id = c.id
GROUP BY ch.client_reseller_id
ORDER BY c.name;

Bingo :) Couldn't see that for looking. Thanks very much. – btongeorge Aug 15 '14 at 11:46 — btongeorge, Aug 15 '14 at 11:46

score 1 · Accepted Answer · edited May 23 '17 at 10:26

You want:

  SELECT c.name, 
         SUM((MONTH(ch.start)=1)*ch.duration) 'Jan',
         SUM((MONTH(ch.start)=2)*ch.duration) 'Feb',
         SUM((MONTH(ch.start)=3)*ch.duration) 'Mar',
         SUM((MONTH(ch.start)=4)*ch.duration) 'Apr',
         SUM((MONTH(ch.start)=5)*ch.duration) 'Jun',
         SUM((MONTH(ch.start)=6)*ch.duration) 'Jul',
         SUM((MONTH(ch.start)=7)*ch.duration) 'Aug'
    FROM call_history ch
    JOIN client c
      ON c.id = ch.client_reseller_id
GROUP BY c.name
ORDER BY c.name;

This uses the fact that booleans are evaluated as true => 1 and false => 0 in MySQL.

It avoids using an IF which is one more logical branch.. but I'm not sure which will be faster because of the * operation, it may be fun to benchmark. IF is also restricted to MySQL, you could use the standard CASE.

I could concede that some could find IF/CASE more readable than using the BOOLEAN => 1,0 dynamic which again may be restricted to MySQL.

I have also included the table aliases on the columns, changed your GROUP BY to the more obvious c.name and used an explicit JOIN.

These are readability issues..

Including the table aliases means you can read the query in isolation and know where each column is coming from.. for example when posting questions! I had to assume that start and duration were from the call_history, which doesn't matter here but absolutely can in other questions.

I tend to GROUP BY the non-aggregate columns in my SELECT in MySQL just to make sure I haven't missed anything. Correct me if I'm wrong, but I think other engines require it. If you have duplicate c.names I would select both c.id and c.name and GROUP BY both to remove any confusion.

As for use of implicit JOINs you can see a discussion here

Seems to have complicated things somewhat - what are the benefits of your approach? — btongeorge, Aug 15 '14 at 11:47
@btongeorge . . . Proper `join` syntax is not a "complication". It is a benefit and you should learn how to use it. — Gordon Linoff, Aug 15 '14 at 12:11
@btongeorge I have provided the benefits as I see them in an edit of the answer. — Arth, Aug 15 '14 at 12:25
@Arth thanks for the clarification its an interesting solution and I learnt something there. Have up voted accordingly :) — btongeorge, Aug 28 '14 at 22:57
@GordonLinoff I will have a read on the benefits of one over the other. Thanks for the pointer, I meant no offence :) — btongeorge, Aug 28 '14 at 22:57

MySQL aggregate query not behaving as expected

2 Answers2