In addition to all the different answers to this question, please remember the right solution depends on your use case.
Method 1 (big DB, no deletions, pk starts from 1)
If the records on your database have sequential primary keys with no gaps (if you never delete objects from your database):
Select one random object:
import random
# Get the biggest primary key
# (generally ):
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
# Pick a random primary key:
random_pk = random.randint(1, pk_max)
# Get that object:
random_object = MyModel.objects.get(pk=random_pk )
Select n random objects:
import random
n = 5
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), n)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)
# Note: this one returns a QuerySet
Problem: Will start crying if you ever have a gap in your primary key column. It makes a huge assumption about the future of your database for probably a very critical operation.
Method 2 (small DB, no deletions, pk starts from 1)
(Note: both A and B are not efficient at all. Only use these if you have to. If your DB table is small enough to do this, maybe you don't even need a table for this and can use something like enums, etc.)
A. By randomly ordering the ENTIRE table with SQL:
# Get one random object:
random_object = MyModel.objects.order_by('?').first()
# Get n Random objects:
random_objects = MyModel.objects.order_by('?')[:n]
Problem: It uses SQL's ORDER BY RAND()
, which may go on your criminal record as a misdemeanor in certain countries in the near future. Also there is a very good chance that it will generate a new temporary table and random numbers for every single record before ordering them based on these random numbers. See this answer for a great technical explanation.
B. By loading ALL the objects from the table first:
import random
# Don't do this!
every_single_object_ever = MyModel.objects.all()
# Get one random object:
random_object = random.choice(every_single_object_ever)
# Get n Random objects:
random_objects = random.choices(every_single_object_ever, k=n)
Problem: You are loading all the objects from your table as Django's MyModel instances to get a single(or several) record.
(Tbh 'every_single_object_ever is a queryset until you use it in the random.choice method')
You can improve this a little bit;
C. By loading ALL the ids from the table first:
Remember that this solution is still terrible, and now it uses two queries instead of one:
import random
every_single_pk_ever = MyModel.objects.values_list('id', flat=True)
# Get one random object:
random_object = random.choice(every_single_pk_ever )
# Get n Random objects:
random_objects = random.choices(every_single_pk_ever , k=n)
Problem: Loads every single pk in the table just to get some of them.
Method 3 (big DB with deletions(or gaps), pk is an integer)
Method 1 assumes that you have sequential primary keys with no gaps. We can get around this using SQL's OFFSET X LIMIT Y
. So instead of defining the primary keys, you will specify the index of the row.
This is almost identical to Emil Ivanov's answer:
# first query:
count = MyModel.objects.all().count()
# second query:
random_offset = random.randint(0,count-1)
MyModel.objects.all()[random_offset].get()
This works efficiently because Django uses python's slicing to modify the query's LIMIT and OFFSET values. See the docs for more on this.
Problem: Since this one uses two queries, total count of the records may change during this period, as Nelo Mitranim stated before. This may throw an exception. Notice how I used .get() at the end. This is to make sure the code throws a DoesNotExist exception if anything goes wrong. Without the .get(), it would throw an IndexError instead.
For multiple objects, you could do MyModel.objects.all()[random_offset-n:random_offset]
but this would make your 'random' objects have sequential row numbers, which means it is more of a random slice from your database. For a list of completely random objects, you would need to get random.sample() between 1 and your count, and run the MyModel.objects.all()[random_offset]
line in a for loop. Which gets quite painful if you need lets say 50 random objects. Which means:
We need another solution for gathering multiple random objects.
Let's say we need 3 random objects. What happens if I do this?
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), 10)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)[:3]
It translates to this:
SELECT * FROM `myapp_mymodel` WHERE id IN (1, 3, 7, 8, 11, 13, 17, 21, 25, 29) LIMIT 3
I only need 3 objects. But I am giving it a list of 10 possible primary keys to choose from. This lowers my chances of hitting a gap and ending up with zero, one or two objects. Performance? Well, since your database engine stops looking for more records when it hits the LIMIT, generating random numbers at the beginning is the only performance sacrifice you are making. Which is not much. If you are positive that you have enough records, you can send a thousand possible primary keys instead:
pk_max= MyModel.objects.all().aggregate(pk_max=Max("pk"))['pk_max']
random_pk_list = random.sample(range(1,pk_max+1), 1000)
random_objects = MyModel.objects.filter(pk__in=random_pk_list)[:3]
Method 4 (Database specific SQL queries, aka getting your hands dirty)
You can always execute raw queries with Django. This might be as simple as this for some database engines:
random_object_list = MyModel.objects.raw('SELECT * FROM myapp_mymodel LIMIT 1 OFFSET TRUNCATE(RANDOM() * (SELECT COUNT(*) FROM myapp_mymodel))')
But most of them require a constant for the OFFSET. A working alternative for the Mysql can be found below(Note: this is an overkill). It locks the table. Sets a random offset based on the count of objects in our table to a variable, and executes the Select query with this random OFFSET value.
from django.db import connection
with connection.cursor():
cursor.execute('LOCK TABLE myapp_mymodel WRITE;')
cursor.execute("SET @randomoffset=CAST(truncate((SELECT COUNT(*) FROM
myapp_mymodel)*RAND(),0)+0 AS SIGNED);")
cursor.execute("PREPARE STMT FROM 'SELECT * FROM myapp_mymodel LIMIT 1 OFFSET ?';")
cursor.execute("EXECUTE STMT USING @randomoffset;")
row = cursor.fetchone()
cursor.execute('UNLOCK TABLES;')
print(row)
You can still increase the performance. Two naive solutions I can think of are importing MySQLDb directly and executing above commands in a single cursor.execute() without the ugly table lock, or using ROWID(if supported) in a where clause instead of the OFFSET since high OFFSETs can cause performance issues.
Final note:
Since it is easier to ask for forgiveness than permission, I'd say just go with method 3, and if you encounter an error, try it again. Do it in a while loop if you really need it, most of the time the query will only need to run once.