I have this database filled with almost 2 million records, a lot of these are duplicated (different internal id but same 'idd')
Basically what I need is to query the database to get the first x records with only 1 product (distinct idd) and ordered by '-sold_count'
So this is the query I'm trying:
Product.objects.distinct('idd').order_by('idd', '-sold_count')[:2000]
Problem 1: That takes roughly 30 seconds, and I fear it will take longer with more data. How can I make it much faster?
Problem 2: Also it doesn't order it by sold_count descending, if anything it seems to be doing the opposite. How can I order it by -sold_count? Tried removing the '-' to experiment but results seem to be the same.
Other information:
idd is a CharField
sold_count is an IntegerField
Hope this is clear enough, if not you can ask any question you may have.
Models:
class Product(models.Model):
price = models.FloatField(default=False, null=True)
sold_count = models.IntegerField(default=False, null=True)
revenue = models.FloatField(default=False, null=True)
idd = models.CharField(max_length=300, default=False, null=True)
remaining = models.IntegerField(default=False, null=True)
category = models.CharField(max_length=300, default=False, null=True)
brand = models.CharField(max_length=300, default=False, null=True)
seller = models.CharField(max_length=300, default=False, null=True)
url = models.CharField(max_length=300, default=False, null=True)
name = models.CharField(max_length=300, default=False, null=True)
shipment = models.CharField(max_length=300, default=False, null=True)
view_count = models.IntegerField(default=False, null=True)
updated = models.DateTimeField(null=True)
Raw Query sent:
SELECT DISTINCT ON ("server_gitti"."idd") "server_gitti"."id", "server_gitti"."price", "server_gitti"."sold_count", "server_gitti"."revenue", "server_gitti"."idd", "server_gitti"."remaining", "server_gitti"."category", "server_gitti"."brand", "server_gitti"."seller", "server_gitti"."url", "server_gitti"."name", "server_gitti"."shipment", "server_gitti"."view_count", "server_gitti"."updated" FROM "server_gitti" ORDER BY "server_gitti"."idd" ASC, "server_gitti"."sold_count" ASC LIMIT 2000