1

I have this query:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)

Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()

But that doesn't work, I get back an empty Array. Any ideas why?

Here is my CheckinAct model:

class CheckinAct(models.Model):
    user = models.ForeignKey(User)
    location = models.ForeignKey(Location)
    time = models.DateTimeField()

----Update------ So now I have updated my query to look like this:

 checkins = CheckinAct.objects.values('user').\
                            filter(time__range=[start, end], location=checkin.location).\
                            annotate(dcount=Count('user'))

But I'm still getting multiple objects back that have the same user, like so:

 [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

---- Update 2------ Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.

checkins = CheckinAct.objects.filter(
                    time__range=[start, end],
                    location=checkin.location,
                ).annotate(dcount=Count('user')).values('user', 'dcount')
        logger.info("checkins!!! : " + str(checkins))

Logs the following:

checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.

Kevin Brown-Silva
  • 40,873
  • 40
  • 203
  • 237
b-ryce
  • 5,752
  • 7
  • 49
  • 79
  • `distinct` merely eliminates duplicate results (as in same everything, even `pk`). It doesn't make the results unique for one field or another. You're likely looking for this: http://stackoverflow.com/questions/629551/how-to-query-as-group-by-in-django – Chris Pratt Nov 23 '11 at 22:45
  • Thanks for the comment Chris, I think I'm getting closer, thanks for pointing me in the right direction. I have updated my post. – b-ryce Nov 23 '11 at 23:06

3 Answers3

1

The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)

For your use-case, the following should give you exactly what you want:

checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
                              values('user').annotate(checkin_count=Count('pk')).order_by()

UPDATE

Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):

User.objects.filter(checkinat__location=location).distinct().count()

UPDATE for non-rel support

checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))

This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.

Chris Pratt
  • 232,153
  • 36
  • 385
  • 444
  • After I get the checkins object, how do I tell how many unique users there are? I think this query is probably giving me what I need, but I don't know how to get the count of unique users out of it. When I just put str(checkins) I get duplicates, like this: [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 17462L}, {'user': 15521L}] So how do I read out just the count of how many unique users? – b-ryce Nov 29 '11 at 16:27
  • I think what you are referring to would work with a db that supports joins. Since I'm on Django non-rel, I get this error: "This database doesn't support JOINs and multi-table inheritance." – b-ryce Dec 05 '11 at 16:31
  • Oh. Ouch. Well, it can still be done, but it'll take extra queries. See update. – Chris Pratt Dec 05 '11 at 16:50
  • Chris. That was it. Thanks for your help! One quick last question. I'll look up coverting lists of tuples to dict to understand this better. – b-ryce Dec 19 '11 at 14:43
0

You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.

What you want to do to get your results is to use annotate first, and then values, such as:

CheckinAct.objects.filter(
    time__range=[start, end],
    location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')

The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).

Hope that helps!

Luke Sneeringer
  • 9,270
  • 2
  • 35
  • 32
0
checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(dcount=Count('user'))

If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?

Can you tell me what happens when you try this?

checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(Count('user')).order_by()

(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

jbm
  • 116
  • 3
  • I tried what you put, but when I do str(checkins) it looks like I'm still getting multiple occurrences of the same user. "checkins : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]" – b-ryce Nov 29 '11 at 16:27