4

I am trying to perform a cumulative sum of values in SQLite. I initially only needed to sum a single column and had the code

SELECT
    t.MyColumn,
    (SELECT Sum(r.KeyColumn1) FROM MyTable as r WHERE r.Date < t.Date)
FROM MyTable as t

Group By t.Date;

which worked fine.

Now I wanted to extend this to more columns KeyColumn2 and KeyColumn3 say. Instead of adding more SELECT statements I thought it would be better to use a join and wrote the following

SELECT
    t.MyColumn,
    Sum(r.KeyColumn1),
    Sum(r.KeyColumn2),
    Sum(r.KeyColumn3)
FROM MyTable as t
Left Join MyTable as r On (r.Date < t.Date)

Group By t.Date;

However this does not give me the correct answer (instead it gives values that are much larger than expected). Why is this and how could I correct the JOIN to give me the correct answer?

rwolst
  • 12,904
  • 16
  • 54
  • 75
  • Please post correct code. Before coming to joins and sums, "SELECT T.* ... GROUP BY T.DATE" cannot possibly work because you are grouping only by date and the other fields can differ. – Lorenzo Gatti Jan 27 '14 at 14:08
  • Yes, I wrote that just to make it clear that the `SELECT` would also be dependent on the original table. However it is incorrect so I will change. But I think that its irrelevant to the actual problem. – rwolst Jan 27 '14 at 14:43

2 Answers2

2

You are likely getting what I would call mini-Cartesian products: your Date values are probably not unique and, as a result of the self-join, you are getting matches for each of the non-unique values. After grouping by Date the results are just multiplied accordingly.

To solve this, the left side of the join must be rid of duplicate dates. One way is to derive a table of unique dates from your table:

SELECT DISTINCT Date
FROM MyTable

and use it as the left side of the join:

SELECT
    t.Date,
    Sum(r.KeyColumn1),
    Sum(r.KeyColumn2),
    Sum(r.KeyColumn3)
FROM (SELECT DISTINCT Date FROM MyTable) as t
Left Join MyTable as r On (r.Date < t.Date)
Group By t.Date;

I noticed that you used t.MyColumn in the SELECT clause, while your grouping was by t.Date. If that was intentional, you may be relying on undefined behaviour there, because the t.MyColumn value would probably be chosen arbitrarily among the (potentially) many in the same t.Date group.

For the purpose of this example, I assumed that you actually meant t.Date, so, I replaced the column accordingly, as you can see above. If my assumption was incorrect, please clarify.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
0

Your join is not working cause he will find way more possibilities to join then your subselect would do.

The join is exploding your table.

The sub select does a sum of all records where the date is lower then the one from the current record.

The join joins every row multiple times aslong as the date is lower then the current record. This mean a single record could do as manny joins as there are records with a date lower. This causes multiple records. And in the end a higher SUM.

If you want the sum from mulitple columns you will have to use 3 sub query or define a unique join.

Luuk
  • 145
  • 6