11

Is it possible to calculate the cumulative (running) sum using django's orm? Consider the following model:

class AModel(models.Model):
    a_number = models.IntegerField()

with a set of data where a_number = 1. Such that I have a number ( >1 ) of AModel instances in the database all with a_number=1. I'd like to be able to return the following:

AModel.objects.annotate(cumsum=??).values('id', 'cumsum').order_by('id')
>>> ({id: 1, cumsum: 1}, {id: 2, cumsum: 2}, ... {id: N, cumsum: N})

Ideally I'd like to be able to limit/filter the cumulative sum. So in the above case I'd like to limit the result to cumsum <= 2

I believe that in postgresql one can achieve a cumulative sum using window functions. How is this translated to the ORM?

wrdeman
  • 810
  • 10
  • 23
  • I don't get it. What is cumsum? And there's only one record with id=1 – Mihai Zamfir Apr 20 '17 at 11:23
  • cumsum == cumulative sum, obviously this is for more than one record - edited to make clearer so the size of the set of data is greater than one. – wrdeman Apr 20 '17 at 11:32
  • I don't think you cand do it with the ORM...use python instead – Mihai Zamfir Apr 20 '17 at 11:41
  • The phrase you are looking for is the [running total](https://en.wikipedia.org/wiki/Running_total) (or running sum). Which is a specialized case of [moving aggregates](https://www.postgresql.org/docs/current/static/xaggr.html#XAGGR-MOVING-AGGREGATES). Which are a kind of [window functions](https://www.postgresql.org/docs/current/static/tutorial-window.html). – pozs Apr 20 '17 at 13:09
  • I think cumulative sum and running total are the same thing http://mathworld.wolfram.com/CumulativeSum.html . But yes it is a window function I'm after. – wrdeman Apr 20 '17 at 13:13
  • @wrdeman yes, you may be right, but the term *cumulative sum* is rarely used (I mainly found MATLAB references and this worfram). I'm not a native english speaker though, but I've heard of "running sum" before, a lot. – pozs Apr 20 '17 at 13:47
  • it's usage may be more field specific (e.g. physics/statistics where it is not that uncommon) but thanks for pointing this out and I've amended the title to reflect this alternative usage. – wrdeman Apr 20 '17 at 13:56

5 Answers5

15

For reference, starting with Django 2.0 it is possible to use the Window function to achieve this result:

AModel.objects.annotate(cumsum=Window(Sum('a_number'), order_by=F('id').asc()))\
              .values('id', 'cumsum').order_by('id', 'cumsum')
Wtower
  • 18,848
  • 11
  • 103
  • 80
Campi
  • 1,932
  • 1
  • 16
  • 21
5

From Dima Kudosh's answer and based on https://stackoverflow.com/a/5700744/2240489 I had to do the following: I removed the reference to PARTITION BY in the sql and replaced with ORDER BY resulting in.

AModel.objects.annotate(
    cumsum=Func(
        Sum('a_number'), 
        template='%(expressions)s OVER (ORDER BY %(order_by)s)', 
        order_by="id"
    ) 
).values('id', 'cumsum').order_by('id', 'cumsum')

This gives the following sql:

SELECT "amodel"."id",
SUM("amodel"."a_number") 
OVER (ORDER BY id) AS "cumsum" 
FROM "amodel" 
GROUP BY "amodel"."id" 
ORDER BY "amodel"."id" ASC, "cumsum" ASC

Dima Kudosh's answer was not summing the results but the above does.

Community
  • 1
  • 1
wrdeman
  • 810
  • 10
  • 23
2

For posterity, I found this to be a good solution for me. I didn't need the result to be a QuerySet, so I could afford to do this, since I was just going to plot the data using D3.js:

import numpy as np
import datettime

today = datetime.datetime.date()

raw_data = MyModel.objects.filter('date'=today).values_list('a_number', flat=True)

cumsum = np.cumsum(raw_data)
Scott Skiles
  • 3,647
  • 6
  • 40
  • 64
1

You can try to do this with Func expression.

from django.db.models import Func, Sum

AModel.objects.annotate(cumsum=Func(Sum('a_number'), template='%(expressions)s OVER (PARTITION BY %(partition_by)s)', partition_by='id')).values('id', 'cumsum').order_by('id')
Dima Kudosh
  • 7,126
  • 4
  • 36
  • 46
  • Thanks, really appreciate your answer. It didn't quite work for me and I've posted my amendment. – wrdeman Apr 20 '17 at 12:53
0

Check this

AModel.objects.order_by("id").extra(select={"cumsum":'SELECT SUM(m.a_number) FROM table_name m WHERE m.id <= table_name.id'}).values('id', 'cumsum')

where table_name should be the name of table in database.

itzMEonTV
  • 19,851
  • 4
  • 39
  • 49