16

I have a Ticket booking model

class Movie(models.Model):
    name = models.CharField(max_length=254, unique=True)

class Show(models.Model):
    day = models.ForeignKey(Day)
    time = models.TimeField(choices=CHOICE_TIME)
    movie = models.ForeignKey(Movie)

class MovieTicket(models.Model):
    show = models.ForeignKey(Show)
    user = models.ForeignKey(User)
    booked_at = models.DateTimeField(default=timezone.now)

I would like to filter MovieTicket with its user field and group them according to its show field, and order them by the recent booked time. And respond back with json data using Django REST framework like this:

[
    {
        show: 4,
        movie: "Lion king",
        time: "07:00 pm",
        day: "23 Apr 2017",
        total_tickets = 2
    },
    {
        show: 7,
        movie: "Gone girl",
        time: "02:30 pm",
        day: "23 Apr 2017",
        total_tickets = 1
    }
]

I tried this way:

>>> MovieTicket.objects.filter(user=23).order_by('-booked_at').values('show').annotate(total_tickets=Count('show'))
<QuerySet [{'total_tickets': 1, 'show': 4}, {'total_tickets': 1, 'show': 4}, {'total_tickets': 1, 'show': 7}]>

But its not grouping according to the show. Also how can I add other related fields (i.e., show__movie__name, show__day__date, show__time)

hynekcer
  • 14,942
  • 6
  • 61
  • 99
Aamu
  • 3,431
  • 7
  • 39
  • 61

4 Answers4

8

I explain it more generally on the graph of the database model. It can be applied to any "GROUP BY" with an extra contents.

          +-------------------------+
          | MovieTicket (booked_at) |
          +-----+--------------+----+
                |              |
      +---------+--------+  +--+---+
      |    Show (time)   |  | User |
      ++----------------++  +------+
       |                |
+------+-------+  +-----+------+
| Movie (name) |  | Day (date) |
+--------------+  +------------+

The question is: How to summarize MovieTicket (the topmost object) grouped by Show (one related object) filtered by User (other related object) with reporting details from some related deeper objects (Movie and Day) and sorting these results by some field aggregated from the topmost model by the group (by the booked time of the recent MovieTicket in the group):

Answer explained by more general steps:

  • Start with the topmost model:
    (MovieTicket.objects ...)
  • Apply filters:
    .filter(user=user)
  • It is important to group by pk of the nearest related models (at least models those which are not made constant by the filter) - It is only "Show" (because "User" object is still filtered to one user)
    .values('show_id')
    Even if all other fields would be unique together (show__movie__name, show__day__date, show__time) it is better for the database engine optimizer to group the query by show_id because all these other fields depend on show_id and can not impact the number of groups.
  • Annotate necessary aggregation functions:
    .annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
  • Add required dependent fields:
    .values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
  • Sort what is necessary:
    .order_by('-last_booking') (descending from the latest to the oldest)
    It is very important to not output or sort any field of the topmost model without encapsulating it by aggregation function. (Min and Max functions are good for sampling something from a group. Every field not encapsulated by aggregation would be added to "group by" list and that will break intended groups. More tickets to the same show for friend could be booked gradually but should be counted together and reported by the latest booking.)

Put it together:

from django.db.models import Max

qs = (MovieTicket.objects
      .filter(user=user)
      .values('show_id', 'show__movie__name', 'show__day__date', 'show__time')
      .annotate(total_tickets=Count('show'), last_booking=Max('booked_at'))
      .order_by('-last_booking')
      )

The queryset can be easily converted to JSON how demonstrated zaphod100.10 in his answer, or directly for people not interested in django-rest framework this way:

from collections import OrderedDict
import json

print(json.dumps([
    OrderedDict(
        ('show', x['show_id']),
        ('movie', x['show__movie__name']),
        ('time', x['show__time']),      # add time formatting
        ('day': x['show__day__date']),  # add date formatting
        ('total_tickets', x['total_tickets']),
        # field 'last_booking' is unused
    ) for x in qs
]))

Verify the query:

>>> print(str(qs.query))
SELECT app_movieticket.show_id, app_movie.name, app_day.date, app_show.time,
    COUNT(app_movieticket.show_id) AS total_tickets,
    MAX(app_movieticket.booked_at) AS last_booking
FROM app_movieticket
INNER JOIN app_show ON (app_movieticket.show_id = app_show.id)
INNER JOIN app_movie ON (app_show.movie_id = app_movie.id)
INNER JOIN app_day ON (app_show.day_id = app_day.id)
WHERE app_movieticket.user_id = 23
GROUP BY app_movieticket.show_id, app_movie.name, app_day.date, app_show.time
ORDER BY last_booking DESC

Notes:

  • The graph of models is similar to ManyToMany relationship, but MovieTickets are individual objects and probably hold seat numbers.

  • It would be easy to get a similar report for more users by one query. The field 'user_id' and the name would be added to "values(...)".

  • The related model Day is not intuitive, but it is clear that is has a field date and hopefully also some non trivial fields, maybe important for scheduling shows with respect to events like cinema holidays. It would be useful to set the field 'date' as the primary key of Day model and spare a relationship lookup frequently in many queries like this.

(All important parts of this answer could be found in the oldest two answers: Todor and zaphod100.10. Unfortunately these answers have not been combined together and then not up-voted by anyone except me, even that the question has many up-votes.)

hynekcer
  • 14,942
  • 6
  • 61
  • 99
3

I would like to filter MovieTicket with its user field and group them according to its show field, and order them by the recent booked time.

This queryset will give you exactly what you want:

tickets = (MovieTicket.objects
            .filter(user=request.user)
            .values('show')
            .annotate(last_booking=Max('booked_at'))
            .order_by('-last_booking')
)

And respond back with json data using Django rest framework like this: [ { show: 4, movie: "Lion king", time: "07:00 pm", day: "23 Apr 2017", total_tickets = 2 }, { show: 7, movie: "Gone girl", time: "02:30 pm", day: "23 Apr 2017", total_tickets = 1 } ]

Well this json data is not the same as the query you described. You can add total_tickets by extending the annotation and show__movie__name into the .values clause: this will change the grouping to show+movie_name, but since show only has one movie_name it wont matter.

However, you cannot add show__day__date and show__time, because one show have multiple date-times, so which one would you want from a group? You could for example fetch the maximum day and time but this does not guarantee you that at this day+time there will be a show, because these are different fields, not related by each other. So the final attempt may look like:

tickets = (MovieTicket.objects
            .filter(user=request.user)
            .values('show', 'show__movie__name')
            .annotate(
                last_booking=Max('booked_at'),
                total_tickets=Count('pk'),
                last_day=Max('show__day'),
                last_time=Max('show__time'),
            )
            .order_by('-last_booking')
)
Todor
  • 15,307
  • 5
  • 55
  • 62
  • I used your answer in my own. It is not necessary to call aggregation at `last_time=Max('show__time')` and `last_day=Max('show__day')` and 'show__day__date' should be instead due to strange model. Otherwise it is a perfect answer and it is a great idea to rename fields by `annotate`, that is `.annotate(..., day=Value('show__day__date'), time=Value('show__time'))`. – hynekcer Sep 01 '17 at 14:01
  • `MySQL` is the only `RDBMS` which will [allow you](https://stackoverflow.com/questions/1023347/mysql-selecting-a-column-not-in-group-by) to do this. Its generally forbidden to have in the select clause non-aggregated values outside of the `group by` clause. I took a look at your answer, I think this query will return incorrect results because you have `date` and `time` in the group by clause. The correct approach would be to `.annotate(last_show_time=Max(DateTime(date='show__date', time='show__time')))` of course this is not supported out of the box, but I think its not impossible to be done. – Todor Sep 01 '17 at 14:55
  • I missed the right name `models.F()`, otherwise it works as expected. It If the query is an aggregation/group_by query, every field or annotation that has no aggregation function is added to "group by". For example this added part of query set `... .annotate(movie=F('show__movie__name'))...` is compiled by adding these SQL parts `SELECT ..., app_movie.name AS movie, ... GROUP BY ... app_movie.name, ...`. – hynekcer Sep 01 '17 at 19:06
  • Yes, the `ORM` is adding it, because its forbidden not to be there, but once added `show__day__date` and `show__time` change the grouping leading to incorrect results, i.e. you don't want to have these fields into the group by clause. – Todor Sep 02 '17 at 08:21
  • OP expects to group by them. Field "show": show_id is a part of his JSON. All fields of Show depend on show_id. Group By results are not refined if any dependent field is added to group by beyond show_id. The same is to add fields of its related objects recursively (Movie, Day). The only object that require aggregation on every used field is the topmost object (MovieTicket - pk, booked_at)... – hynekcer Sep 02 '17 at 12:32
  • ... You can object that your and my result are equal. Yes, but I consider Group By a field, that is redundant due to dependency, more explicit and more clear than a redundant Max(). (It is also normal that if I would buy two tickets for me and very soon two tickets for friends with seats in another row for the same projection, I want to see 4 tickets in an overview, but if I want 5 tickets for a group another day the same movie, I want to see 4 and 5 with exact time. Maybe 2, 2, 5 is still acceptable result by mistake, but if Max() is used, any mistake would be more dangereous.) – hynekcer Sep 02 '17 at 12:32
  • Forgive me, I've misunderstood the relations, I thought it was possible to have the same `Show` with different `day/times` but now when a look again, thats not the case. So ye its okey add them to the grouping w/o aggregation. My point was that its necessary to have aggregation over non-unique fields per group. For example take a case where we group by `movie` instead of show, now its possible for a movie to have many shows, and if we add it to values w/o aggregating on it, this will lead to duplicates (having the same movie returned for every show). – Todor Sep 06 '17 at 07:12
1

You have to group by show and then count the total number of movie tickets.

MovieTicket.objects.filter(user=23).values('show').annotate(total_tickets=Count('show')).values('show', 'total_tickets', 'show__movie__name', 'show__time', 'show__day__date'))

Use this serilizer class for the above queryset. It will give the required json output.

class MySerializer(serializers.Serializer):
    show = serailizer.IntegerField()
    movie = serializer.StringField(source='show__movie__name')
    time = serializer.TimeField(source='show__time')
    day = serializer.DateField(source='show__day__date')
    total_tickets = serializer.IntegerField()

It is not possible to order_by booked_at since that information gets lost when we group by show. If we order by booked_at group by will happen on unique booked_at times and show ids and that is why the ticket count was coming 1. Without order_by you will get correct count.

EDIT:

use this query:

queryset = (MovieTicket.objects.filter(user=23)
            .order_by('booked_at').values('show')
            .annotate(total_tickets=Count('show'))
            .values('show', 'total_tickets', 'show__movie__name',
                    'show__time', 'show__day__date')))

You cannot annotate on an annotated field. So you will to find the total tickets count in python. To calculate total_tickets count for unique show ids:

tickets = {}
for obj in queryset:
    if obj['show'] not in tickets.keys():
        tickets[obj['show']] = obj
    else:
        tickets[obj['show']]['total_tickets'] += obj['total_tickets']

the final list of objects you need is tickets.values()

The same serializer above can be used with these objects.

zaphod100.10
  • 3,331
  • 24
  • 38
  • I think your query and mine is about the same, which is still giving me the same result (i.e., ``). I think I didn't quite get your answer. Could you please elaborate more. – Aamu Apr 28 '17 at 06:58
  • Yes, but I need to list out by the recently booked ticket. Now the list is ordered in respect to the time of the show. Is there any other way to list it out in that way? – Aamu Apr 30 '17 at 19:19
  • @Aamu: you cannot group by show and order by booked time. If you want to order by booked_at field on movie ticket model group_by will be based on the booked_at and show. I have added more info in answer to get the correct counts when you are ordering by booked_at too. – zaphod100.10 May 01 '17 at 04:45
  • @Aamu did you try out my answer? – zaphod100.10 May 05 '17 at 16:10
  • 1
    I used your answer in my own. Your problem was that .order_by('booked_at') broke the groups if it was used without aggregation `Max()`. The name 'total_tickets' is not important in `values()`. Otherwise it is a perfect answer. – hynekcer Sep 01 '17 at 14:03
1

You can try this.

Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('movieticket_set__user')).values('movie__name', 'time', 'day').distinct()

OR

Show.objects.filter(movieticket_sets__user=23).values('id').annotate(total_tickets=Count('id')).values('movie__name', 'time', 'day').distinct()
Ashish
  • 354
  • 2
  • 6