3

I have two models, Project and Session. One project has many sessions, one user has many projects:

class Project(models.Model):
    class Meta:
        ordering = [models.functions.Lower("name")]

    name = models.CharField(max_length=255)
    user = models.ForeignKey(User, on_delete=models.CASCADE)

class Session(models.Model):
    start = models.DateTimeField()
    end = models.DateTimeField()
    timezone = TimeZoneField()
    breaks = models.IntegerField(default=0, validators=[MinValueValidator(0)])
    project = models.ForeignKey(Project, on_delete=models.CASCADE)

    def duration(self):
        # returns minutes in (end - start)

I want a way to get all projects for a given user, sorted by the sum of duration in all its sessions. As session.duration() is not a database field, but rather is calculated from database fields, I cannot get this information in a single database query.

My current solution is:

sessions = Session.objects.filter(project__user=self)
groups = [[a, sum([s.duration() for s in b])] for a, b in groupby(
 sessions, key=lambda s: s.project
)]
groups = sorted(groups, key=lambda g: g[1], reverse=True)
return [g[0] for g in groups]

This gets all relevant sessions in a single query, but then I group them by project and this takes too long - about a second when there are about a 100 projects. Is there a way to accomplish this that takes less time? And ideally doesn't require a database call for every project?

I am using Django 2.0.

Sam Ireland
  • 512
  • 4
  • 20

1 Answers1

1

You can use annotations and aggregation to achieve this. First, modify the Session model a bit by changing this line:

project = models.ForeignKey(Project, on_delete=models.CASCADE)

to this:

project = models.ForeignKey(Project, related_name='sessions', on_delete=models.CASCADE)

-now every Project instance will have a sessions field, which will contain the queryset of all Sessions related to that Project.

Instead of taking all user sessions like you do now, you can take all user's Projects and loop through each project's Sessions like:

projects = Project.objects.filter(user=self)
for p in projects:
    sessions = p.sessions.all()

Then you can manipulate the sessions queryset, annotating them with an expression field like:

from django.db.models import ExpressionWrapper, F, fields

duration_ = ExpressionWrapper(F('end') - F('start'), output_field=fields.DurationField())
sessions = p.sessions.annotate(d=duration_)

At this point each member of the sessions queryset will have a field named d holding the duration of the sorresponding Session. To sum the durations, we can use the aggregation feature of Django querysets, like this:

from django.db.models import Sum
total = sessions.aggregate(total_duration=Sum('d'))["total_duration"]

What we're doing on the 2nd line is creating a single element from a queryset ("aggregating" it), by adding all the values in the d field, and assigning the result to a field called total_duration. The result of this expression:

sessions.aggregate(total_duration=Sum('d'))

is a dict with only one key (total_duration), from which we take the value.

Next, you can build a list of projects and durations, and sort it afterwards by duration, e.g. like this:

import operator
plist = []
for p in projects:
    sessions = p.sessions.annotate(d=duration_)
    total = sessions.aggregate(total_duration=Sum('d'))["total_duration"]
    # total holds the sum of this project's sessions
    plist.append({'p':p,'total':total})
plist.sort(key=operator.itemgetter('total'))

projects = [item['p'] for item in plist]

To sum it up:

import operator
from django.db.models import F, Sum, ExpressionWrapper, fields

duration_ = ExpressionWrapper(F('end') - F('start'), output_field=fields.DurationField())
projects = Project.objects.filter(user=self)
plist = []

for p in projects:
    sessions = p.sessions.annotate(d=duration_)
    total = sessions.aggregate(total_duration=Sum('d'))["total_duration"]
    # total holds the sum of this project's sessions
    plist.append({'p':p,'total':total})

plist.sort(key=operator.itemgetter('total'))

projects = [item['p'] for item in plist]

Reference: this answer, Django Query Expressions, Django Aggregation

Paolo Stefan
  • 10,112
  • 5
  • 45
  • 64
  • Thank you for this - I'm still a little unclear though. Here `total` is the sum of every session's duration. The output I need is a list of projects sorted by total duration of that project's sessions. I feel like you've got 90% of the way there, I just can't work out how to use the duration_ field above to achieve that. – Sam Ireland May 05 '18 at 16:21
  • @SamIreland I've updated my answer. Now it should be complete. – Paolo Stefan May 05 '18 at 21:01