0

I have a Django MySQL table that holds a list of items that have been submitted to my server. The table is currently very small (low 100's) but I expect it to eventually get to several 100,000s.

I want my database query to return a random set of 30 results from this table.

My table model is:

class Item(models.Model):

    ITEM_TYPES = (
        ('V', 'Vine'),
        ('Y', 'YouTube'),
        ('P', 'Photo'),         # Photo is stored by us on a CDN somewhere
        ('F', 'Flickr'),
        ('I', 'Instagram'),
        ('D', 'DeviantArt'),
        ('5', '500px'),
    )
    owner = models.ForeignKey(User, on_delete=models.CASCADE)     
    title = models.CharField(max_length=60, default='')           
    url = models.CharField(max_length=250, default='', unique=True)
    item_type = models.CharField(max_length=1, choices=ITEM_TYPES)    
    keywords = models.ManyToManyField(Keyword, related_name='keywords')
    credits_applied = models.IntegerField(default=5000)
    credits_left = models.IntegerField(default=10)
    credits_gifted = models.IntegerField(default=0)
    date_added = models.DateTimeField(auto_now_add=True)               
    liked = models.IntegerField(default=0)                        
    disliked = models.IntegerField(default=0)                        
    active = models.BooleanField(default=True)
    comment = models.CharField(max_length=100, blank=True)         
    approved = models.BooleanField(default=0)       

My query currently looks like this:

Item.objects.filter(item_type='P', active=True, keywords__name__in=item_categories)[30]

I know that I can randomize the results using order_by('?') in my query but this apparently gets to be an extremely inefficient and time consuming way to do this for large tables.

I have searched extensively for a solution but have failed to find one. I would imagine this is something that quite a lot of people want to do?

The closest solution I have found in Stackoverflow is here:

In Django, how do I select 100 random records from the database?

But the accepted answer there is for the '?' option which scales very very badly for large tables. The other answer is only suitable if you are doing random queries once or just a few times. I need to do random queries 1000s of times a day.

The only solution I can think of is to do a query to return a list of IDs that satisfy the query conditions, and then randomly select 30 IDs from that list, and then do another query to return the records for those IDs. Is there a better way?

Community
  • 1
  • 1
Bill Noble
  • 6,466
  • 19
  • 74
  • 133
  • Possible duplicate of [In Django, how do I select 100 random records from the database?](http://stackoverflow.com/questions/3506678/in-django-how-do-i-select-100-random-records-from-the-database) – e4c5 May 08 '16 at 13:02
  • The approved answer in the link you give is for the solution that uses '?'. This scales really really badly. The other answer is for when one wants a one off, or very occasional random result. I need to produce random results frequently. Many thousands of time a day. – Bill Noble May 08 '16 at 13:05
  • Your solution seems like a good way to do it... fetch IDs, randomize in Python, and then fetch the objects. Two very efficient queries instead of one extremely expensive one. – solarissmoke May 09 '16 at 03:13

0 Answers0