3

Using MySQL 5.6.21 on Windows 7.

I am attempting to return a 'random' row from a table seeded by the date (so the same row returns for the current day and then switches the next day etc - a "random quote of the day generator" if you like).

I noticed the same rows keep coming up so I simplified the query to its basics, it appears the RAND() function generates very similar numbers every fourth seed value. When rounded to an integer the values appear to repeat every fourth seed. This example only uses 16 rows, but you get the idea.

create table t (i INT);

insert into t values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);

select i, ceil(rand(i) * 16), rand(i) from t;

drop table t;

Gives...

0   3   0.15522042769493574
1   7   0.40540353712197724
2   11  0.6555866465490187
3   15  0.9057697559760601
4   3   0.15595286540310166
5   7   0.40613597483014313
6   11  0.6563190842571847
7   15  0.9065021936842261
8   3   0.15668530311126755
9   7   0.406868412538309
10  11  0.6570515219653505
11  15  0.907234631392392
12  3   0.15741774081943347
13  7   0.40760085024647497
14  11  0.6577839596735164
15  15  0.9079670691005579

Not what I expected, so what am I doing wrong? I expected a pseudo-random sequence to be generated.

MCrossley
  • 123
  • 2
  • 8
  • Oh my, that's a *very* dodgy RNG function they have. – Boann Nov 28 '14 at 11:02
  • That's what I thought. I have reverted to doing the random number seeding and generation in PHP. It means doing two queries but it works consistently. – MCrossley Nov 29 '14 at 15:53

2 Answers2

0

According to documentation RAND(n) is working properly only if n is constant. The effect of using a nonconstant argument is undefined. As of MySQL 5.0.13, nonconstant arguments are not permitted.

And as they say RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.

MySQL documentation

Check what will happen if you use RAND() without parameter.

LHristov
  • 1,103
  • 7
  • 16
  • Well, I have to seed RAND() if I want the same value returned for multiple queries. You can also reproduce the behaviour with statically defined seeds... select rand(0),rand(1),rand(2),rand(3),rand(4),rand(5),rand(6),rand(7),rand(8); select ceil(rand(0)*16),ceil(rand(1)*16),ceil(rand(2)*16),ceil(rand(3)*16),ceil(rand(4)*16),ceil(rand(5)*16),ceil(rand(6)*16),ceil(rand(7)*16),ceil(rand(8)*16); – MCrossley Nov 28 '14 at 13:19
  • To answer your other point, yes, if you seed rand() then use it multiple times, it generates a pseudo-random sequence from the first value. My point is that the first rand() value returned is *very* sensitive to the seed. Repeating to the third decimal place every 4th increment of the seed. – MCrossley Nov 28 '14 at 13:24
  • I think this does answer the question; as RAND() is not meant to be a perfect random generator. – Shaun Cockerill Feb 04 '20 at 07:38
0

RAND() is not meant to be a perfect random generator, so if it is not suitable, then other ways of generating random numbers should be used. In my case, I needed to order rows differently per day, and RAND(CURDATE()+0) proved to be a viable solution.

Yes, it does seem as though the first term has a cycle of 4, and it will be given a similar random number each time the seed increments by 4.

I got the following results from my own basic testing of random number generation and incrementing the seed by 4 each time, using the seed number 0, 4, 8, and 12.

  • Row 1: Incremented by about 0.001
  • Row 2: Incremented by about 0.005
  • Row 3: Incremented by about 0.02
  • Row 4: Incremented by about 0.05
  • Row 5: Incremented by about 0.2
  • Row 6: Decremented by about 0.01 (or maybe it incremented by 0.99)
  • Row 7: Incremented by about 0.26

In OP's example, because they don't use a constant seed number, they technically have a sample size of 1. The impact with recurring indexes is lessened as the row count increases, and there is a higher chance that other items will appear before or after other items when they wouldn't have previously.

A couple of caveats I've worked out from using RAND():

If RAND() is only used as a SELECT column or WHERE condition, and a LIMIT is specified, then RAND() will only be generated once for each of the returned rows.

CREATE TABLE t (i INT);
INSERT INTO t VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);
SELECT i, RAND(0) FROM t LIMIT 0, 1; # 0, 0.15522042769493574
SELECT i, RAND(0) FROM t LIMIT 1, 1; # 1, 0.15522042769493574

If RAND() is used in the ORDER BY statement, then RAND() will be calculated for all matching rows regardless of any LIMIT.

CREATE TABLE t (i INT);
INSERT INTO t VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);
SELECT i, RAND(0) FROM t ORDER BY RAND(0) LIMIT 0, 1; # 0, 0.15522042769493574
SELECT i, RAND(0) FROM t ORDER BY RAND(0) LIMIT 1, 1; # 6, 0.2964166321758336
Shaun Cockerill
  • 800
  • 8
  • 11