I have a Django MySQL table that holds a list of items that have been submitted to my server. The table is currently very small (low 100's) but I expect it to eventually get to several 100,000s.
I want my database query to return a random set of 30 results from this table.
My table model is:
class Item(models.Model):
ITEM_TYPES = (
('V', 'Vine'),
('Y', 'YouTube'),
('P', 'Photo'), # Photo is stored by us on a CDN somewhere
('F', 'Flickr'),
('I', 'Instagram'),
('D', 'DeviantArt'),
('5', '500px'),
)
owner = models.ForeignKey(User, on_delete=models.CASCADE)
title = models.CharField(max_length=60, default='')
url = models.CharField(max_length=250, default='', unique=True)
item_type = models.CharField(max_length=1, choices=ITEM_TYPES)
keywords = models.ManyToManyField(Keyword, related_name='keywords')
credits_applied = models.IntegerField(default=5000)
credits_left = models.IntegerField(default=10)
credits_gifted = models.IntegerField(default=0)
date_added = models.DateTimeField(auto_now_add=True)
liked = models.IntegerField(default=0)
disliked = models.IntegerField(default=0)
active = models.BooleanField(default=True)
comment = models.CharField(max_length=100, blank=True)
approved = models.BooleanField(default=0)
My query currently looks like this:
Item.objects.filter(item_type='P', active=True, keywords__name__in=item_categories)[30]
I know that I can randomize the results using order_by('?')
in my query but this apparently gets to be an extremely inefficient and time consuming way to do this for large tables.
I have searched extensively for a solution but have failed to find one. I would imagine this is something that quite a lot of people want to do?
The closest solution I have found in Stackoverflow is here:
In Django, how do I select 100 random records from the database?
But the accepted answer there is for the '?' option which scales very very badly for large tables. The other answer is only suitable if you are doing random queries once or just a few times. I need to do random queries 1000s of times a day.
The only solution I can think of is to do a query to return a list of IDs that satisfy the query conditions, and then randomly select 30 IDs from that list, and then do another query to return the records for those IDs. Is there a better way?