mysql order by rand() performance issue and solution

Question

i was using order by rand() to generate random rows from database without any issue but i reaalised that as the database size increase this rand() causes heavy load on server so i was looking for an alternative and i tried by generating one random number using php rand() function and put that as id in mysql query and it was very very fast since mysql was knowing the row id but the issue is in my table all numbers are not availbale.for example 1,2,5,9,12 like that.

if php rand() generate number 3,4 etc the query will be blank as there is no id with number 3 , 4 etc.

what is the best way to generate random numbers preferable from php but it should generate the available no in that table so it must check that table.please advise.

$id23=rand(1,100000000);
    SELECT items FROM tablea where status='0' and id='$id23' LIMIT 1

the above query is fast but generate sometimes no which is not availabel in database.

    SELECT items FROM tablea where status=0 order by rand() LIMIT 1

the above query is too slow and causes heavy load on server

Have a look here:: http://stackoverflow.com/questions/1244555/how-can-i-optimize-mysqls-order-by-rand-function — Sudhir Bastakoti, Jan 15 '13 at 03:49
Do you (already) know the number of rows or requires it a second query to get it? — hek2mgl, Jan 15 '13 at 03:49

score 8 · Accepted Answer · answered Jan 15 '13 at 03:49

8

First of, all generate a random value from 1 to MAX(id), not 100000000.

Then there are at least a couple of good solutions:

Use > not =
```
SELECT items FROM tablea where status='0' and id>'$id23' LIMIT 1
```
Create an index on (status,id,items) to make this an index-only query.
Use =, but just try again with a different random value if you don't find a hit. Sometimes it will take several tries, but often it will take only one try. The = should be faster since it can use the primary key. And if it's faster and gets it in one try 90% of the time, that could make up for the other 10% of the time when it takes more than one try. Depends on how many gaps you have in your id values.

answered Jan 15 '13 at 03:49

Bill Karwin

538,548
86
673
828

@bill-karvin so i should use > or = as you are suggesting that > will be slow – raviloves Jan 15 '13 at 03:52
2

Either way, it's much better than ORDER BY RAND(). :-) – Bill Karwin Jan 15 '13 at 03:56
@raviloves The first should be a little quicker (as there is no chance of having to rerun) but is slightly less random depending on where the gaps are in the ids. – Jim Jan 15 '13 at 04:06
@Jim, right, an id value following a gap is chosen more frequently. But the imperfectly-random choice may nevertheless be adequate for some applications. – Bill Karwin Jan 15 '13 at 04:07
If you use >, you need an `ORDER BY`, else it defeats the purpose of the random number. Also, `INDEX(status, id)` would further enhance the speed in _this_ case. – Rick James Sep 05 '16 at 19:28

score 8 · Answer 2 · answered Jan 15 '13 at 03:56

8

Use your DB to find the max value from the table, generate a random number less than or equal to that value, grab the first row in which the id is greater than or equal to your random number. No PHP necessary.

SELECT items
FROM tablea
WHERE status = '0' and
      id >= FLOOR(1 + RAND() * (SELECT MAX(id) FROM tablea))
LIMIT 1

answered Jan 15 '13 at 03:56

pyrospade

7,870
4
36
52

Thanks! The major benefit is not having to hit the DB twice just to ensure that you aren't under/over stating the max id of your table. – pyrospade Jan 15 '13 at 04:02
This particular solution has the drawback that it favors rows early in the table. It scans the table testing `id` against the re-evaluated expression repeatedly until success. – Rick James Sep 05 '16 at 19:35
I'm not as familiar with MySQL, but wouldn't this just be an index scan? Possibly even an index only scan if the index is on (id, status, items) – pyrospade Sep 05 '16 at 23:38
There are two steps -- Evaluate the subquery once; it needs an index on `id` (presumably `PRIMARY KEY(id)`); this is very efficient. Then it scans through part of the table. `INDEX(status, id, items)`, in _this_ order, is optimal and 'covering'. The problem is that the result will 'prefer' items after strings of `status=0` and/or after gaps in `id`. It could also return nothing - when `RAND()` is too close to 1.0. – Rick James Nov 15 '16 at 16:09

score 1 · Answer 3 · answered Jan 15 '13 at 03:47

1

You are correct, ORDER BY RAND() is not good solution if you are dealing with large datasets. Depending how often it needs to be randomized, what you can do is generate a column with a random number and then update that number at some predefined interval.

You would take that column and use it as your sort index. This works well for a heavy read environment and produces predicable random order for a certain period of time.

answered Jan 15 '13 at 03:47

datasage

19,153
2
48
54

i was thinking something like you suggested.what i am thinking i would like to discuss.i will generate row id from big database and store in some other table where i will keep maximum of 1000 numbers and i will generate random number from that table and after the work is done delete that row.how it is?????? – raviloves Jan 15 '13 at 03:50

score 1 · Answer 4 · answered Jan 15 '13 at 03:52

1

A possible solution is to use limit:

$id23=rand(1,$numberOfRows);

SELECT items FROM tablea where status='0' LIMIT $id23 1

This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.

answered Jan 15 '13 at 03:52

Jim

22,354
6
52
80

I would suggest this too :) But wasn't sure about the number of rows is already available – hek2mgl Jan 15 '13 at 03:53
@hek2mgl i have clerly written that i am using this one but the problem is all rows are not available and this method will generate sometimes numbers which are not availabel and in that case the query will be blank – raviloves Jan 15 '13 at 03:54
2

It turns out this is not fast, because it has to scan through $id23 rows to do the offset. Also it's not a quick operation to get the COUNT() of the number of rows. – Bill Karwin Jan 15 '13 at 03:54
@BillKarwin Thanks! I always wanted to know that. – hek2mgl Jan 15 '13 at 03:56
1

@BillKarwin Huh, I'd have assumed MySQL would figure out a clever way to jump into the middle of the dataset. Does this mean for large datasets pagination is best done using ids? – Jim Jan 15 '13 at 03:59
@Jim, right, you can't use an index to skip 100 rows and expect to find the value 101, because there might be gaps (the 100th row might actually have id 133 or something). And yes, this makes pagination hard to optimize. Better to use id's because then you can query `WHERE id>132 LIMIT 10` to get that page, without doing any OFFSET. But it means you need to know the first id on that page (hint: if you're displaying page n, you know the *last* id on that page, so the first id on the next page n+1 is greater). – Bill Karwin Jan 15 '13 at 04:06
1

@BillKarwin Ah, of course. No way to find the 5th record if you don't know what the first 4 are. – Jim Jan 15 '13 at 04:07

mysql order by rand() performance issue and solution

4 Answers4

Linked