0

I have a Django project that consists of a scraper of our inventory, run on the server as a cronjob every few hours, and the Django Admin page - which we use to view / access all items.

We have about 30 items that are indexed. So each 'Scraping Operation' consists of about 30 individual 'Search Operations' each of which get around 500 results per run.

Now, this description is a bit confusing, so I've included the models below.

class ScrapingOperation(models.Model):
    date_started = models.DateTimeField(default=timezone.now, editable=True)
    date_completed = models.DateTimeField(blank=True, null=True)
    completed = models.BooleanField(default=False)
    round = models.IntegerField(default=-1)
    trusted = models.BooleanField(default=True)


class Search(models.Model):
    item = models.ForeignKey(Item, on_delete=models.CASCADE)
    date_started = models.DateTimeField(default=timezone.now, editable=True)
    date_completed = models.DateTimeField(blank=True, null=True)
    completed = models.BooleanField(default=False)
    round = models.IntegerField(default=1)
    scraping_operation = models.ForeignKey(ScrapingOperation, on_delete=models.CASCADE, related_name='searches')
    trusted = models.BooleanField(default=True)

    def total_ads(self):
        return self.ads.count()


class Ad(models.Model): 

  item = models.ForeignKey(Item, on_delete=models.CASCADE, related_name='ads')    
  title = models.CharField(max_length=500)
  price = models.DecimalField(max_digits=8, decimal_places=2, null=True)         
  first_seen = models.DateTimeField(default=timezone.now, editable=True)
  last_seen = models.DateTimeField(default=timezone.now, editable=True)

  def __str__(self):
      return self.title

Now here is the problem we've run into.

On the admin pages for both the Search model and the SeachOperation model we would like to see the amount of ads scraped for that particular object (represented as a number) This works fine four our seachers, but our implementation for the SearchOperation has run into problems

This is the code that we use:

class ScrapingOperationAdmin(admin.ModelAdmin):
    list_display = ['id', 'completed', 'trusted', 'date_started', 'date_completed', 'number_of_ads']
    list_filter = ('completed', 'trusted')
    view_on_site = False

    inlines = [
        SearchInlineAdmin,
    ]

    def number_of_ads(self, instance):
        total_ads = 0
        for search in instance.searches.all():
            total_ads += search.ads.count()
        return total_ads

The problem that we have run into is this: The code works and provides the correct number, however, after +/- 10 ScrapingOperation we noticed that the site started to slow done when loading the page. We are now up to 60 ScrapingOperations and when we click the ScrapingOperations page in the Django admin it takes almost a minute to load.

Is there a more efficient way to do this? We thought about saving the total number of ads to the model itself, but it seems wasteful to dedicate a field to information that should be accessible with a simple .count() call. Yet our query is evidently so inefficient that the entire site locks down for almost a minute when it is executed. Does anyone have an idea of what we are doing wrong?

Based on the comments below I am currently working on the following solution:

def number_of_ads(self, instance):
    total_ads = 0
    searches = Search.objects.filter(scraping_operation=instance).annotate(Count('ads'))
    for search in searches:
        total_ads += search.ads__count
    return total_ads
Jasper
  • 2,131
  • 6
  • 29
  • 61

1 Answers1

1

Use an annotation when getting the queryset

from django.db.models import Count
class ScrapingOperationAdmin(admin.ModelAdmin):
     ...
     def get_queryset(self, request):
           qs = super().get_queryset(request)
           qs.annotate(number_of_ads=Count('searches__ads')
           return qs
kevswanberg
  • 2,079
  • 16
  • 21