65

I have this MySQL query:

SELECT DAYOFYEAR(`date`)  AS d, COUNT(*) 
FROM  `orders` 
WHERE  `hasPaid` > 0
GROUP  BY d
ORDER  BY d

Which returns something like this:

d  | COUNT(*) |
20 |  5       |
21 |  7       |
22 | 12       |
23 |  4       |

What I'd really like is another column on the end to show the running total:

d  | COUNT(*) | ??? |
20 |  5       |   5 |
21 |  7       |  12 |
22 | 12       |  24 |
23 |  4       |  28 |

Is this possible?

Jay Prall
  • 5,295
  • 5
  • 49
  • 79
nickf
  • 537,072
  • 198
  • 649
  • 721
  • possible duplicate of [Create a Cumulative Sum Column in MySQL](http://stackoverflow.com/questions/2563918/create-a-cumulative-sum-column-in-mysql) – Ztyx Jan 09 '15 at 13:54
  • 3
    @Ztyx Your linked question was asked more than a year later. So, it would be rather the other way round. – Olaf Dietsche Oct 13 '16 at 20:14

5 Answers5

122

Perhaps a simpler solution for you and prevents the database having to do a ton of queries. This executes just one query then does a little math on the results in a single pass.

SET @runtot:=0;
SELECT
   q1.d,
   q1.c,
   (@runtot := @runtot + q1.c) AS rt
FROM
   (SELECT
       DAYOFYEAR(`date`) AS d,
       COUNT(*) AS c
    FROM  `orders`
    WHERE  `hasPaid` > 0
    GROUP  BY d
    ORDER  BY d) AS q1

This will give you an additional RT (running total) column. Don't miss the SET statement at the top to initialize the running total variable first or you will just get a column of NULL values.

Autosoft
  • 1,346
  • 1
  • 9
  • 6
  • 1
    that works brilliantly! Looking at the `EXPLAIN` on this shows it to be much more efficient than the previously accepted answer – nickf Aug 17 '09 at 23:04
  • Key point is to use a subquery. This makes it reliable in complex queries that involve multiple tables or aggregations. – Álvaro González Aug 02 '11 at 11:41
  • 2
    For those wanting to do something like this with PHP's basic MySQL functions, be sure to run the first line separately (but still before the 2nd). –  Aug 11 '11 at 12:50
  • 4
    Ariel, please provide a working example as an answer to the post instead of leaving a flyby. Would love to see how your suggestion works! – Andrew Hedges Jan 03 '13 at 15:47
  • 41
    It's worth to mention that `@rentot` variable could be set right in the `SELECT` statement in `FROM` clause like this: `... AS q1, (SELECT @runtot:=0) AS n`. This might make life for php folks much easier because now you have only one statement query. – peterm Jan 27 '13 at 08:17
  • Isn't it two passes: one for each `SELECT` statement? The number of rows in the temporary/intermediate table is limited thanks to the `GROUP by DAYOFYEAR(date)` though – PaulH Oct 29 '15 at 10:12
11
SELECT 
   DAYOFYEAR(O.`date`)  AS d, 
   COUNT(*),
   (select count(*) from `orders` 
       where  DAYOFYEAR(`date`) <= d and   `hasPaid` > 0)
FROM  
  `orders` as O
WHERE  
  O.`hasPaid` > 0
GROUP  BY d
ORDER  BY d

This will require some syntactical tuning (I don't have MySQL to test it), but it shows you the idea. THe subquery just has to go back and add up everything fresh that you already included in the outer query, and it has to do that for every row.

Take a look at this question for how to use joins to accomplish the same.

To address concerns about performance degradation with growing data: Since there are max. 366 days in a year, and I assume that you are not running this query against multiple years, the subquery will get evaluated up to 366 times. With proper indices on the date and the hasPaid flag, you'll be ok.

Community
  • 1
  • 1
cdonner
  • 37,019
  • 22
  • 105
  • 153
  • 1
    Be aware that this will be extremely slow on big, average and some of the small databases, because it needs to do as many additional queries as there will be rows in result – Sergej Andrejev Mar 20 '09 at 02:18
  • Agree. I +1'd this answer because it is clever, and we've all used solutions like this when needed, but we are also all aware there is a cost. Depends on where you need the running count. For the business logic? Then maybe do this in the DB. For the view? Do it in code. – Jarret Hardie Mar 20 '09 at 02:24
7

Starting with MySQL 8, you will be using window functions for this kind of query:

SELECT dayofyear(`date`) AS d, count(*), sum(count(*)) OVER (ORDER BY dayofyear(`date`))
FROM `orders`
WHERE `hasPaid` > 0
GROUP BY d
ORDER BY d

In the above query, the aggregate function count(*) is nested inside of the window function sum(..) OVER (..), which is possible because of the logical order of operations in SQL. If that's too confusing, you can easily resort to using a derived table or a WITH clause to better structure your query:

WITH daily (d, c) AS (
  SELECT dayofyear(`date`) AS d, count(*)
  FROM `orders`
  WHERE `hasPaid` > 0
  GROUP BY d
)
SELECT d, c, sum(c) OVER (ORDER BY d)
ORDER BY d
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
1

It is possible to calculate a running balance using a temporary table in MySQL. The following query should work:

CREATE TEMPORARY table orders_temp1 (SELECT id, DAYOFYEAR(`date`)  AS d, COUNT(*) as total FROM  `orders` WHERE  `hasPaid` > 0 GROUP BY d ORDER  BY d);
CREATE TEMPORARY table orders_temp2 (SELECT * FROM orders_temp1);
SELECT d, total, (SELECT SUM(t2.total) FROM orders_temp2 t2 WHERE t2.id<=t1.id) as running_total FROM orders_temp1 t1;

A temporary table is used for organizing the query. Note that a temporary table only exists for the duration of the connection to the MySQL server

The above query uses a sub query, which returns balance of all rows in the temporary table upto and including the current row. The balance is assigned to the current row in the actual table

Nadir Latif
  • 3,690
  • 1
  • 15
  • 24
  • The query uses temporary tables which are stored in memory. They are much faster than disk based tables. But if your tables have a lot of data or the query will be executed by several users at the same time, then using temporary tables will make the query slow. See: https://stackoverflow.com/questions/18865438/from-a-performance-perspective-how-efficient-is-it-to-use-a-mysql-temporary-tab – Nadir Latif Mar 02 '20 at 13:26
0

I would say that this is impossible every resulting row should be independent. Use programming language for getting these values

Sergej Andrejev
  • 9,091
  • 11
  • 71
  • 108
  • Given the nature of relational math, and the fact that you're using group by, even if mysql has some hack to make this possible, it would be less convoluted to just do it in a programming language as Sergej suggests. – Jarret Hardie Mar 20 '09 at 02:05
  • 7
    I would disagree. Splitting the processing tasks between the database and the application layer is problematic from a reuse and maintenance perspective. If you want to use this data in different places, maybe on a report and on a screen, you'd have to duplicate the running totals logic. – cdonner Mar 20 '09 at 02:08
  • +1 you're right: this would be easier and better overall in the programming logic - I was trying to see if there was some magic awesome function to do it. – nickf Mar 20 '09 at 02:14
  • When you have a considerable amount of data you have to compromise some purity, and also, in this case it really doesn't look like true "logic" to me, it could be seen just as a "visual aid", there is no real business logic in accumulate values. – Sam Mar 20 '09 at 02:16
  • Agree with Sam. A report and a screen are both view code. The re-usable "logic" should be encapsulated in the view layer... application design notwithstanding. – Jarret Hardie Mar 20 '09 at 02:26
  • 2
    A running total column is not view logic. It is data embedded in the table. Yes, SQL is relational set logic, but nearly always ordered. Suggesting this is impossible is incorrect. Suggesting it's poor form requires discussion. – Brendan Jan 10 '13 at 03:48
  • 1
    This is actually quite easy with modern DBMS (using window functions) it is far from being "impossible" –  May 20 '14 at 22:35