42

I am trying to get a random object from a model A

For now, it is working well with this code:

random_idx = random.randint(0, A.objects.count() - 1)
random_object = A.objects.all()[random_idx]

But I feel this code is better:

random_object = A.objects.order_by('?')[0]

Which one is the best? Possible problem with deleted objects using the first code? Because, for example, I can have 10 objects but the object with the number 10 as id, is not existing anymore? Did I have misunderstood something in A.objects.all()[random_idx] ?

Erwan
  • 1,055
  • 1
  • 12
  • 26
  • Why would you make 2 queries (one for count, one for actual select) instead of 1? – Selcuk Apr 02 '14 at 15:51
  • 2
    I think the second one is probably better, but the first one isn't subject to the problem you describe, because it's indexing a list you've already bounded, not selecting by the database ID. Also, why not `random.choice(A.objects.all())`? – Two-Bit Alchemist Apr 02 '14 at 15:51
  • possible duplicate of [How to pull a random record using Django's ORM?](http://stackoverflow.com/questions/962619/how-to-pull-a-random-record-using-djangos-orm) – alecxe Apr 02 '14 at 15:54
  • 2
    @Two-BitAlchemist blergh, that's the worst of all: getting all rows from the database in order to return just one. – Daniel Roseman Apr 02 '14 at 15:55
  • @DanielRoseman It's also plenty readable, leaves `A.objects.all()` in order (unlike solution 2) if it's used somewhere else, and concisely illustrates another potential use case. I don't see anything asking about _performance_, just what will work, and for a small number of objects, readability is more important. – Two-Bit Alchemist Apr 02 '14 at 15:58
  • @alecxe I don't think it is a duplicate. I already read the answers on this thread before submitting mine but my question is more accurate, and the answers and comments here are more interesting. Just my opinion... – Erwan Apr 02 '14 at 16:16

9 Answers9

83

Just been looking at this. The line:

random_object = A.objects.order_by('?')[0]

has reportedly brought down many servers.

Unfortunately Erwans code caused an error on accessing non-sequential ids.

There is another short way to do this:

import random

items = list(Product.objects.all())

# change 3 to how many random items you want
random_items = random.sample(items, 3)
# if you want only a single random item
random_item = random.choice(items)

The good thing about this is that it handles non-sequential ids without error.

nik_m
  • 11,825
  • 4
  • 43
  • 57
lukeaus
  • 11,465
  • 7
  • 50
  • 60
  • 5
    Looking at the documentation of the `random` module, `random.sample(items, 1)[0]` can be avoided by using `random.choice(items)`. See [random.choice](https://docs.python.org/3/library/random.html#random.choice). – Acsor Aug 11 '17 at 19:15
  • 2
    If you want to get the object from `random.choice(items)`, use `items = list(Product.objects.all())` – therealak12 Jun 19 '20 at 17:40
  • 2
    Watch out if your Product table is very large, you will load in memory all your products, which can fill the memory pretty fast. I think the values_list('pk', flat=True) approach proposed by @km6 is better in that regard. – Benjamin_Mourgues Nov 24 '22 at 10:50
31

Improving on all of the above:

from random import choice

pks = A.objects.values_list('pk', flat=True)
random_pk = choice(pks)
random_obj = A.objects.get(pk=random_pk)

We first get a list of potential primary keys without loading any Django object, then we randomly choose one primary key, and then we load the chosen object only.

km6
  • 2,191
  • 2
  • 15
  • 18
12

The second bit of code is correct, but can be slower, because in SQL that generates an ORDER BY RANDOM() clause that shuffles the entire set of results, and then takes a LIMIT based on that.

The first bit of code still has to evaluate the entire set of results. E.g., what if your random_idx is near the last possible index?

A better approach is to pick a random ID from your database, and choose that (which is a primary key lookup, so it's fast). We can't assume that our every id between 1 and MAX(id) is available, in the case that you've deleted something. So following is an approximation that works out well:

import random

# grab the max id in the database
max_id = A.objects.order_by('-id')[0].id

# grab a random possible id. we don't know if this id does exist in the database, though
random_id = random.randint(1, max_id + 1)

# return an object with that id, or the first object with an id greater than that one
# this is a fast lookup, because your primary key probably has a RANGE index.
random_object = A.objects.filter(id__gte=random_id)[0]
Sohan Jain
  • 2,318
  • 1
  • 16
  • 17
  • 1
    The first code does not evaluate the entire list. Slices in Django querysets are translated into LIMIT/OFFSET calls in the SQL. – Daniel Roseman Apr 02 '14 at 16:05
  • What I meant is: LIMIT/OFFSET in SQL is notoriously slow, because it has to nearly evaluate the entire list. – Sohan Jain Apr 02 '14 at 16:06
  • You should replace the `get` by `filter`. Now you get the following error: `TypeError: 'A' object does not support indexing` – J. Ghyllebert Aug 08 '14 at 13:31
  • 1
    I would replace all "id"s with "pk"s. For more information, take a look at http://stackoverflow.com/questions/2165865/django-queries-id-vs-pk – 1man May 09 '16 at 23:59
  • This is not working if there are too many gaps in the PKs, like in a table that is constantly re-imported. – Risadinha Sep 22 '17 at 09:27
  • 2
    Not very great random. Imagine, you have 3 objects with id 1, 2 and 99 (other was removed). In this case we have 98% possibility that you algorithm returns 99 – rluts Feb 24 '20 at 20:46
7

How about calculating maximal primary key and getting random pk?

The book ‘Django ORM Cookbook’ compares execution time of the following functions to get random object from a given model.

from django.db.models import Max
from myapp.models import Category

def get_random():
    return Category.objects.order_by("?").first()

def get_random3():
    max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
    while True:
        pk = random.randint(1, max_id)
        category = Category.objects.filter(pk=pk).first()
        if category:
            return category

Test was made on a million DB entries:

In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253

In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794

See source.

After seeing those results I started using the following snippet:

from django.db.models import Max
import random

def get_random_obj_from_queryset(queryset):
    max_pk = queryset.aggregate(max_pk=Max("pk"))['max_pk']
    while True:
        obj = queryset.filter(pk=random.randint(1, max_pk)).first()
        if obj:
            return obj

So far it did do the job as long as there is an id. Notice that the get_random3 (get_random_obj_from_queryset) function won’t work if you replace model id with uuid or something else. Also, if too many instances were deleted the while loop will slow the process down.

Pawel Kam
  • 1,684
  • 3
  • 14
  • 30
1

Yet another way:

pks = A.objects.values_list('pk', flat=True)
random_idx = randint(0, len(pks)-1)
random_obj = A.objects.get(pk=pks[random_idx])

Works even if there are larger gaps in the pks, for example if you want to filter the queryset before picking one of the remaining objects at random.

EDIT: fixed call of randint (thanks to @Quique). The stop arg is inclusive.

https://docs.python.org/3/library/random.html#random.randint

Risadinha
  • 16,058
  • 2
  • 88
  • 91
0

I'm sharing my latest test result with Django 2.1.7, PostgreSQL 10.

students = Student.objects.all()
for i in range(500):
    student = random.choice(students)
    print(student)

# 0.021996498107910156 seconds

for i in range(500):
    student = Student.objects.order_by('?')[0]
    print(student)

# 0.41299867630004883 seconds

It seems that random fetching with random.choice() is about 2x faster.

Exis Zhang
  • 502
  • 6
  • 10
0

Taking Django's lazy database access into account, the naive time for selecting a random element basically comes down to the time it takes to run len(A.obejcs.all()).

On the database I am trying this out on, it takes a few seconds to do this.

The solution sugested below is instant.

A better way is to wrap the query in a Paginator object:

import random
from django.core.paginator import Paginator, Page


paginator = Paginator(Sample.objects.all().order_by('pk'), 25)
random_page = paginator.get_page(random.choice(paginator.page_range))
random_sample = random.choice(random_page.object_list)

The 25 pages per pagination is just a guess for a good value.

So basically, we choose a random page, and in that page we choose a random sample.

henrikstroem
  • 2,978
  • 4
  • 35
  • 51
-1

in python for getting a random member of a iterable object like list,set, touple or anything else you can use random module.

random module have a method named choice, this method get a iterable object and return a one of all members randomly.

so becouse random.choice want a iterable object you can use this method for queryset in django.

first import the random module:

import random

then create a list:

my_iterable_object = [1, 2, 3, 4, 5, 6]

or create a query_set like this:

my_iterable_object = mymodel.objects.filter(name='django')

and for getting a random member of your iterable object use choice method:

random_member = random.choice(my_iterable_object)
print(random_member) # my_iterable_object is [1, 2, 3, 4, 5, 6]

3

full code:

import random

my_list = [1, 2, 3, 4, 5, 6]

random.choice(my_list)

2

  • 2
    While this code snippet may solve the question, [including an explanation](//meta.stackoverflow.com/q/392712/4733879) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – Filnor Sep 03 '20 at 09:13
-1
import random


def get_random_obj(model, length=-1):
    if length == -1:
        length = model.objects.count()

    return model.objects.all()[random.randint(0, length - 1)]


#to use this function
random_obj = get_random_obj(A)
Mhmoud Sabry
  • 365
  • 3
  • 7