4

The second section of this answer uses variables to create a cumulative sum of another column. I'm doing the same thing, except that I am using a GROUP BY statement, and summing COUNT(*) instead of a column. Here is my code to create a minimal table and insert values:

CREATE TABLE `test_group_cumulative` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `group_id` int(11) unsigned NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `test_group_cumulative` (`id`, `group_id`)
VALUES
    (1, 1),
    (2, 2),
    (3, 3);

And here is the code that is failing:

SELECT
    `group_id`,
    COUNT(*) AS `count`,
    @count_cumulative := @count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT @count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;

Here is the result:

group_id    count   count_cumulative
1   1   1
2   1   1
3   1   1

As you can see, count_cumulative is NOT summing correctly. However, here's the weird part. If I replace the COUNT(*) in count_cumulative with it's value, 1, the query works correctly.

    @count_cumulative := @count_cumulative + 1 AS `count_cumulative`

Here is the correct result:

group_id    count   count_cumulative
1   1   1
2   1   2
3   1   3

Obviously, in my app, there will be more than one item in each group, so COUNT(*) won't always be 1. I know there are ways to do this with joins or subqueries, and I'll do that if I have to, but in my mind this SHOULD work. So why isn't COUNT(*) working inside of a cumulative sum?

Community
  • 1
  • 1
Joel
  • 2,654
  • 6
  • 31
  • 46

2 Answers2

2

I agree with @Ashalynd, the value of count(*) is not evaluated yet. Here is a little experiment I did :

1.
    SELECT
        GROUP_ID,    
        @COUNTER := @COUNTER + COUNT(*)  GROUPCOUNT,
        @COUNTER COUNTER
     FROM
        TEST_GROUP_CUMULATIVE, 
        (SELECT @COUNTER := 0) R
    GROUP BY
        GROUP_ID;

-- RESULT
============

   GROUP_ID    GROUPCOUNT    COUNTER
  ------------------------------------     
   1           1             0
   2           1             0
   3           1             0

2.
    SELECT @COUNTER;

    -- RESULT
    =============

    @COUNTER
    --------
    1

For each group the variable is being initialized as 0. This means COUNT(*) has not been evaluated yet.

Also, when you do:

 1.
    SELECT
        GROUP_ID,    
        @COUNTER := @COUNTER + 1  GROUPCOUNT,
        @COUNTER COUNTER
     FROM
        TEST_GROUP_CUMULATIVE, 
        (SELECT @COUNTER := 0) R
    GROUP BY
        GROUP_ID;

-- RESULT
============
   GROUP_ID    GROUPCOUNT    COUNTER
  ------------------------------------     
   1           1             1
   2           1             2
   3           1             3

2.    
SELECT @COUNTER;

    -- RESULT
    =============

    @COUNTER
    --------
    3

It does not have to evaluate 1. It directly sums it up and it gives you the cumulative sum.

Mubin Shrestha
  • 398
  • 3
  • 22
1

This is a problem I often face when doing time series analysis. My preferred way to tackle this is to wrap it into a second select and introduce the counter in the last layer. And you can adapt this technique to more complicated data flows using temporary tables, if reqiured.

I did this small sqlfiddle using the schema you present: http://sqlfiddle.com/#!2/cc97e/21

And here is the query to get the cumulative count:

SELECT
tgc.group_id, @count_cumulative := @count_cumulative + cnt as cum_cnt
FROM (
  SELECT
    group_id, COUNT(*) AS cnt
  FROM `test_group_cumulative` 
  group by group_id
  order by id) AS `tgc`, 
(SELECT @count_cumulative := 0) AS `temp_var`; 

This is the result I get:

GROUP_ID    CUM_CNT
1           1
2           2
3           3

The reason your attempt did not work:

When you do a group by with the temporary variable, mysql executes individual groups independently, and at the time each group is assigned the temporary variable current value, which in this case is 0.

If, you ran this query:

SELECT @count_cumulative;

immediately after

SELECT
    `group_id`,
    COUNT(*) AS `count`,
    @count_cumulative := @count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT @count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;

you would get the value 1. For each of your groups, the @count_cumulative is being reset to 0.

Hence, in my proposed solution, I circumvent this issue by generating the 'group-counts' first and then doing the accumulation.

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85