1

Let's say we have the following simplistic models:

class Category(models.Model):
    name = models.CharField(max_length=264)

   def __str__(self):
        return self.name

    class Meta:
        verbose_name_plural = "categories"

class Status(models.Model):
    name = models.CharField(max_length=264)

   def __str__(self):
        return self.name

    class Meta:
        verbose_name_plural = "status"

class Product(models.Model):
    title = models.CharField(max_length=264)
    description = models.CharField(max_length=264)
    category = models.ForeignKey(Category, on_delete=models.CASCADE)
    price = models.DecimalField(max_digits=10)
    status = models.ForeignKey(Status, on_delete=models.CASCADE)

My aim is to get some statistics, like total products, total sales, average sales etc, based on which price bin each product belongs to.

So, the price bins could be something like 0-100, 100-500, 500-1000, etc.

I know how to use pandas to do something like that: Binning column with python pandas

I am searching for a way to do this with the Django ORM.

One of my thoughts is to convert the queryset into a list and apply a function to get the apropriate price bin and then do the statistics.

Another thought which I am not sure how to impliment, is the same as the one above but just apply the bin function to the field in the queryset I am interested in.

IordanouGiannis
  • 4,149
  • 16
  • 65
  • 99

1 Answers1

0

There are three pathways I can see.

First is composing the SQL you want to use directly and putting it to your database with a modification of your models manager class. .objects.raw("[sql goes here]"). This answer shows how to define group with a simple function on the content - something like that could work?

SELECT FLOOR(grade/5.00)*5 As Grade, 
       COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1

Second is that there is no reason you can't move the queryset (with .values() or .values_list()) into a pandas dataframe or similar and then bin it, as you mentioned. There is probably a bit of an efficiency loss in terms of getting the queryset into a dataframe and then processing it, but I am not sure that it would certainly or always be bad. If its easier to compose and maintain, that might be fine.

The third way I would try (which I think is what you really want) is chaining .annotate() to label points with the bin they belong in, and the aggregate count function to count how many are in each bin. This is more advanced ORM work than I've done, but I think you'd start looking at something like the docs section on conditional aggregation. I've adapted this slightly to create the 'price_class' column first, with annotate.

Product.objects.annotate(price_class=floor(F('price')/100).aggregate(
      class_zero=Count('pk', filter=Q(price_class=0)),
      class_one=Count('pk', filter=Q(price_class=1)),
      class_two=Count('pk', filter=Q(price_class=2)),    # etc etc
  )

I'm not sure if that 'floor' is going to work, and you may need 'expression wrapper' to ensure the push price_class into the write type of output_field. All the best.

Atcrank
  • 439
  • 3
  • 11
  • can you please explain what is `floor` here? – Chiefir May 10 '18 at 11:46
  • Floor truncates a number to the lowest whole number. It would be the same as integer divide by 1. Floor(1.2) = 1. Floor(2.9)=2. It's useful a shortcut way to define your 'bins' in these examples. I took the SQL straight from another answer and it presumably works; but I didn't try it in the "annotate" statement and I can't vouch for it working there, although floor is part of python, it may not be in the Django ORM in a way that my suggestion is valid. – Atcrank May 10 '18 at 23:06