26

What's the definition of bias in:

The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32.

If it's the kind of bias stated in alternate tie-breaking rules for rounding, I don't think it really matters (since the bias is not really visible).

Besides mt_rand() is claimed to be four times faster than rand(), just by adding three chars in front!

Assuming mt_rand is available, what's the disadvantage of using it?

Spudley
  • 166,037
  • 39
  • 233
  • 307
Pacerier
  • 86,231
  • 106
  • 366
  • 634
  • 3
    "just by adding three chars in front" changes it from one function call to another, which works entirely differently. Why would it surprise you that it goes three times faster? Would it have been more convincing if they'd named the function `really_fast_rand()`? ;-) – Spudley Oct 18 '11 at 13:38
  • There are none. mt_rand is just another function that is much better than the old function at producting random-ish-numbers – OptimusCrime Oct 18 '11 at 13:48
  • 2
    @Spudley I mean.. that's a joke – Pacerier Oct 18 '11 at 14:15
  • @Pacerier - I thought it was.... that's why I put a winky smily at the end of my comment. :) – Spudley Oct 18 '11 at 14:19
  • @Spudley lol ok, even :) – Pacerier Oct 18 '11 at 14:25
  • 2
    Wait. If mt_rand has 3 more characters in front of it, doesn't that mean it would end up executing 3 times slower? The syntax parser has to process those 3 extra characters. Just saying... ;) – OCDev Dec 28 '14 at 00:13

2 Answers2

57

mt_rand uses the Mersenne Twister algorithm, which is far better than the LCG typically used by rand. For example, the period of an LCG is a measly 232, whereas the period of mt_rand is 219937 − 1. Also, all the values generated by an LCG will lie on lines or planes when plotted into a multidimensional space. Also, it is not only practically feasible, but relatively easy to determine the parameters of an LCG. The only advantage LCGs have is being potentially slightly faster, but on a scale that is completely irrelevant when coding in php.

However, mt_rand is not suitable for cryptographic purposes (generation of tokens, passwords or cryptographic keys) either.

If you need cryptographic randomness, use random_int in php 7. On older php versions, read from /dev/urandom or /dev/random on a POSIX-conforming operating system.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • 13
    For a visual representation of the difference: [link](http://tjl.co/blog/code/followup-php-rand-vs-mt_rand/) – James Alday Oct 18 '11 at 13:51
  • @JamesAlday Great link! Added it to the answer. – phihag Oct 18 '11 at 13:54
  • 3
    Worth noting that `openssl_random_pseudo_bytes()` is only available from PHP 5.3. (another good reason to make sure you're on the latest php release) – Spudley Oct 18 '11 at 13:54
  • To confirm, your answer means there is **no** advantage in rand() over mt_rand(), and rand() should be deprecated, right? – Pacerier Oct 18 '11 at 14:16
  • @Pacerier LCGs are faster than Mersenne Twister, so if you don't care about the quality of randomness at all (for example, if you want to show one of 6 random pictures on the frontpage of your website), `rand` is still adequate. Therefore, it's unlikely to be deprecated, but you're right in that if you care about the quality of randomness, you should use `mt_rand`. – phihag Oct 18 '11 at 14:29
  • @phihag wait I'm getting confused, isn't `mt_rand` supposed to be 4 times faster than the algorithm used by `rand` ? – Pacerier Oct 18 '11 at 17:05
  • 1
    @Pacerier The php doc says so, but conveniently fails to compare to a specific algorithm and architecture. Mersenne twister is really fast, and since you're programming in php anyways, any supposed speed difference doesn't really matter anyway. You're right in that **there is no reason to use `rand` over `mt_rand`** unless you want to use the precise implementation of a specific glibc on an embedded system or a very special scientific simulation. – phihag Oct 18 '11 at 17:36
  • Linkrot on the "lie on lines or planes", sadly. – Kzqai Jan 07 '15 at 23:37
  • @Kzqai Thanks for the notice; fixed. – phihag Jan 08 '15 at 12:32
  • @JamesAlday the image in the link no longer seems to work, could not find a similar site :-( – dangel May 24 '15 at 01:59
  • Yeah, and I'm not even sure now what the original image looked like! But I think that [this SO post](http://stackoverflow.com/questions/26230210/is-mt-rand-more-secure-than-rand) has some nice images that get the point across... – James Alday May 28 '15 at 14:44
  • The image in [the link](http://web.archive.org/web/20140801002727/http://tjl.co/blog/code/followup-php-rand-vs-mt_rand/) works fine for me. – phihag May 28 '15 at 16:37
  • Is it okay if I edit in an update about `random_int()`? :) – Scott Arciszewski Dec 08 '15 at 06:48
  • Updated with `random_int`. – phihag Dec 08 '15 at 08:46
  • `openssl_random_pseudo_bytes()` is *not* a secure PRNG. See [this guide](https://paragonie.com/blog/2015/07/how-safely-generate-random-strings-and-integers-in-php) for more information and instructions on how to do it correctly. – rugk Aug 24 '16 at 15:28
9

The distribution quirk that you quoted is only relevant when the random number range you're generating is larger than 2^32. That is 4294967296.

If you're working with numbers that big, and you need them to be randomised, then perhaps this is a reason to reconsider using mt_rand(). However if your working with numbers smaller than this, then it is irrelevant.

The reason it happens is due to the precision of the random number generator not being good enough in those high ranges.

I've never worked with random numbers that large, so I've never needed to worry about it.

The difference between rand() and mt_rand() is a lot more than "just three extra characters". They are entirely different function calls, and work in completly different ways. Just the same as you don't expect print() and print_r() to be similar.

mt_rand() gets it's name from the "Mersene Twister" algorithm it uses to generate the random numbers. This algorithm is known to be a quick, efficient and high quality random number generator, which is why it is available in PHP.

The older rand() function makes use of the operating system's random number generator by making a system call. This means that it uses whatever random number generator happens to be the default on the operating system you're using. In general, the default random number generator uses a much slower and older algorithm, hence the claim that my_rand() is quicker, but it will vary from system to system.

Therefore, for virtually all uses, mt_rand() is a better function to use than rand().

You say "assuming mt_rand() is available", but it always will be since it was introduced way back in PHP4.

Spudley
  • 166,037
  • 39
  • 233
  • 307
  • Isn't it true that `rand` uses libc and not the *system default/selected* ? – Pacerier Oct 18 '11 at 14:18
  • @Pacerier - libc is what I meant; the point is that it makes a call to a library that is external to PHP, and therefore out of the control of PHP in terms of quality/speed/etc. – Spudley Oct 18 '11 at 14:25