0

I look on some posts in SO about random queryset in Django. All i can find is that order_by('?') is not fast and efficient at all, but my experience does not tell so.

I have a table with 30.000 entries (final production will have around 200.000 entries). I can create separate tables around 10-15k entries each.

So, i want to get very fast and efficient a random of 100 (maybe 200) items. Idea of creating list of 100 random numbers is in my opinion not good enough, because some PKs will be missing (because of deleting, etc..).

And, i don't want a generate a random number, and then 99 following items. I will be using Postgresql (no special reason...i can choose other if they are better). I tested order_by('id')[:100] and it seems very fast (i think). It took only? 0.017s per list.

  • Why the docs says that this is not good operation for random?
  • Which random do you prefer?
  • Is there any better way to do this?
juree
  • 253
  • 2
  • 12

1 Answers1

1
ORDER BY random()
LIMIT n

is a valid approach but slow because every single row in the table has to be considered.
This is still fast with 30 k rows, but with 30 M rows .. not so much.

I suggest this related questiion:
Best way to select random rows PostgreSQL

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This is same as order_by('?')[:n]. Okay. How about it there are many request for this operation, is it bad for database server or not? – juree Jun 23 '14 at 20:18
  • @juree: Hard to tell. Depends on a lot of variables. You'd have to test. I suggest you study the answers to the previous (linked) question for faster solutions. – Erwin Brandstetter Jun 23 '14 at 20:25
  • 1
    @juree Yes, this is the query generated by using `order_by('?')[:n]`. I suggest you preserve your sanity for now and use this approach instead of writing a large raw SQL query for optimisation that you don't need at the moment. The [Zen of Python](http://legacy.python.org/dev/peps/pep-0020/) was written for a reason. :) If you database size increases (as in millions of rows) and the query becomes unbearably slow, you can optimise the query or implement some sort of caching mechanism, but if you don't have performance issues, it's to early to optimise. – knbk Jun 23 '14 at 20:27
  • 1
    Already studying. I think i saw that solution before, but i did not think it is good for me. Obviously i must look better. Thanx, Erwin. @knbk, sure. This is solution, but as i say, i read about it, and better programmers says this is not good approach. So, if i found a better stuff in start, why not :) – juree Jun 23 '14 at 20:28