2

I am using django-hitcount to tally hits to my database objects. I want to count the hits by object to determine which object has the most hits in a given time range. The app has two models of interest here:

class Hit(models.Model):
    created         = models.DateTimeField(editable=False)
    ip              = models.CharField(max_length=40, editable=False)
    session         = models.CharField(max_length=40, editable=False)
    user_agent      = models.CharField(max_length=255, editable=False)
    user            = models.ForeignKey(User,null=True, editable=False)
    hitcount        = models.ForeignKey(HitCount, editable=False)

class HitCount(models.Model):
    hits            = models.PositiveIntegerField(default=0)
    modified        = models.DateTimeField(default=datetime.datetime.utcnow)
    content_type    = models.ForeignKey(ContentType,
                        verbose_name="content cype",
                        related_name="content_type_set_for_%(class)s",)
    object_pk       = models.TextField('object ID')
    content_object  = generic.GenericForeignKey('content_type', 'object_pk')

"Hit" logs each hit with a timestamp while HitCount stores overall number of hits. To get the hits by object and within a time range, I need to do the following:

Filter Hit by date created Count number of hits per content_object (within time range filtered above) order by count calculated above return content_object and count

This could be very expensive, so I planned on calcing/caching once per day.

As a first step, I wanted to count the number of hits per content_object irrespective of time range.

limited_hc = Hit.objects.all().values('hitcount__content_object').annotate(count = Count('hitcount__object_pk'))

I immediately run into an issue:

Cannot resolve keyword 'hitcount__content_object' into field. Choices are: created, hitcount, id, ip, session, user, user_agent

After some digging, I found that annotation and generic relations do not work well together. If I use object_pk instead of content_object, it works fine, but then I don't have the name of the object.

So my question: What is an alternative to achieve the same result? How can group by object but also retain the name?

I do have the model (content_type) and the id (object_pk), so I could always pull these separately, but that seems inelegant. . .

Ed.
  • 4,439
  • 10
  • 60
  • 78
  • No possibility to try that right now, but have you tried `limited_hc = Hit.objects.all().values('hitcount__content_type', 'hitcount__object_pk').annotate(count = Count('hitcount__object_pk'))`? – Bernhard Vallant Nov 19 '10 at 11:57
  • That works, but I'm not getting the link to the actual object. So I can't get the name, only the pk. I'd then have to ping the database again to pull the name data from the respective table – Ed. Nov 19 '10 at 16:41

1 Answers1

1

It will probably be more efficient for your purpose to add a generic relation to Hit model:

class Hit(models.Model):
    ...
    object_id = models.PositiveIntegerField()
    content_type = models.ForeignKey(ContentType)
    content_object = generic.GenericForeignKey('content_type', 'object_id')

and then run count() query on the Hit directly:

t = ContentType.objects.get_for_model(the_object_being_hit)
id = the_object_being_hit.id
count = Hit.objects.filter(
                   created__range=(from_timestamp, to_timestamp),
                   content_type = t,
                   object_id = id
                 ).count()

You can use Django South migration system to modify the hit-count's model. You can also try to subclass Hit after monkey-patching it's Meta class, or just define your own models that suit your needs better.

edit If you want to count hits for a whole class of objects or several classes, then you can have:

count = Hit.objects.filter(
                     created__range = myrange,
                     content_type__in = set_of_types
                   ).count()

Where the set_of_types can be either a list constructed with get_for_model calls or a query set obtained by direct filtering of the ContentType table.

The nice part of the count() method is that it makes counting happen in the database, which is much faster.

To get a breakdown by content_type try this:

counts = Hit.objects.filter(
                   created__range = myrange
                ).values(
                   'content_type'
                ).annotate(
                   Count('content_type')
                )

That should return a dictionary of counts vs content type id, quite close to what you want.

Evgeny
  • 10,698
  • 9
  • 60
  • 70
  • I worked on this solution. I'm not sure it gives me what I want though. Instead of the hit count of one object (given a time range, content_type, and object_id), I want the hit counts of all the objects (given a time range and content_type) so that I can rank them. It seems that even when I restructure the models.py as you describe, I can't use annotate and content_object at the same time. – Ed. Nov 19 '10 at 18:14
  • updated my post. well, I'd simply not use annotate in this case because your query would be doing too much work and would be I think quite slow for large sets. IMO the lazy approach in counting might work better in this case. – Evgeny Nov 19 '10 at 18:35
  • Thank you for following up. This still isn't quite achieving my objective. With this, I'm still getting one number. I'm hoping for an iterable list. So let's say, in your last block of code, if we left out the .count(), we'd get 5 objects: 3 of one type A and 2 of type B. I want the query to give me 2 objects and the count for each: Type A (count = 3) and Type B (count = 2). Not simply 5 (which is what I'm getting from the code above). – Ed. Nov 19 '10 at 20:53
  • Wasn't that clear in your question, but I've tried put up an option for that. – Evgeny Nov 19 '10 at 22:48
  • I apologize for any confusion. Thank you for your help. I think I will take the counts as we've reached in this solution and simply have the database grab the object information one by one. I would have liked to use annotate(Count('content_object')), but as I state in the question, this appears to be broken in Django. – Ed. Nov 20 '10 at 23:00