1

I was just goofing around with PHP and I decided to generate some random numbers with PHP_INT_MIN (-9223372036854775808) and PHP_INT_MAX (9223372036854775807). I simply echoed the following:

echo rand(-9223372036854775808, 9223372036854775807);

I kept refreshing to see the numbers generated and to view the randomness of the numbers, as a result I started to notice a pattern emerging. Every 2-4 refreshes 0 appeared and this happened without fail, at one stage I even got 0 to appear 4x in a row.

I wanted to experiment further so I created the following snippet:

<?php
$countedZero = 0;
$totalGen = 250;

for ($i = 1; $i <= $totalGen; $i++) {
    $rand = rand(-9223372036854775808, 9223372036854775807);

    if ($rand == 0) {
        echo $i . ": <font color='red'>" . $rand . "</font><br/>";

        $countedZero++;
    } else {
        echo $i . ": " . $rand . "<br/>";
    }
}
echo "0 was generated " . $countedZero . "/" . $totalGen . " times which is " . (($countedZero / $totalGen) * 100) . "%."
?>

this would give me a clear idea of what the generation rate is. I ran 8 tests:

  • The first 3 tests were using a $totalGen of 250. (3 tests total).

  • The second 3 tests were using a $totalGen of 1000. (6 tests total).

  • The third test was just to see what the results would be on a larger number, I chose 10,000. (7 tests total).

  • The fourth test was the final test, I was intrigued at this point because the last (large number) test got such a high result surprisingly so I raised the stakes and set $totalGen to 500,000. (8th test total).

Results

I took a screenshot of the results. I took the first output, I didn't keep testing it to try and get it to fit a certain pattern:

Test 1 (250)

(1).

(2).

(3).

Test 2 (1000)

(1).

(2).

(3).

Test 3 (10,000)

(1).

Test 4 (500,000)

(1).

From the above results, it is safe to assume that 0 has a very high probability of showing up even when the range of possible numbers is at its maximum. So my question is:

Is there a logical reason to why this is happening?

Considering how many numbers it can choose from why is 0 a recurring number?

Note Test 8 was originally going to be 1,000,000 but it lagged out quite badly so I reduced it to 500,000 if someone could test 1,000,000 and show the results by editing the OP it would be much appreciated.

Edit 1

As requested by @maiorano84 I used mt_rand instead of rand and these were the results.

Test 1 (250)

(1).

(2).

(3).

Test 2 (1000)

(1).

(2).

(3).

Test 3 (10,000)

(1).

Test 4 (500,000)

(1).

The results as you can see show that 0 still has a high probability of showing up. Also using the function rand provided the lowest result.

Update

It seems that in PHP7 when using the new function random_int it fixes the issue.


Example PHP7 random_int

https://3v4l.org/76aEH

Community
  • 1
  • 1
Script47
  • 14,230
  • 4
  • 45
  • 66
  • 1
    What happens when using [mt_rand](http://php.net/manual/en/function.mt-rand.php)? – maiorano84 Oct 26 '15 at 02:50
  • @maiorano84 I'm not sure, I haven't tested that yet. Do you expect the results to be a lot different? – Script47 Oct 26 '15 at 02:51
  • I've heard that mt_rand will generate a more reliable random number than rand. The complexities behind *why* that is are beyond me, though. – maiorano84 Oct 26 '15 at 02:52
  • i don't think its a php issue, its a stats\math issue, i can't explain it, but i think if you asked on http://math.stackexchange.com/ you would get an answer. –  Oct 26 '15 at 02:59
  • @Dagon my initial thought was to post it on that site, but I thought I'd see if it was something to do with how `PHP` generates numbers or some other programming related reason, if I can't get an answer on here I could ask to have it migrated (if that is possible) to see if I can get an answer on there. – Script47 Oct 26 '15 at 03:01
  • 1
    @maiorano84 I posted the results when using `mt_rand` look for **Edit 1**. – Script47 Oct 26 '15 at 03:35
  • https://3v4l.org/N1Mn5 – Scott Arciszewski Oct 26 '15 at 19:18
  • Down-voter, no doubt this is to do with the latest meta SO incident, hope it was worth it. – Script47 May 17 '18 at 18:01

2 Answers2

4

This is basically an example of how someone wrote a bad rand() function. When you specify the min/max range in rand(), you hit a part of PHP's source that just results in imperfect distribution in the PRNG.

Specifically lines 44-45 of php_rand.h in php-src, which is the following macro:

#define RAND_RANGE(__n, __min, __max, __tmax) \
    (__n) = (__min) + (zend_long) ((double) ( (double) (__max) - (__min) + 1.0) * ((__n) / ((__tmax) + 1.0)))

From higher up the call stack (lines 300-302 in rand.c of php-src):

if (argc == 2) {
    RAND_RANGE(number, min, max, PHP_RAND_MAX);
}

RAND_RANGE being the macro defined above. By removing the range parameters by just calling rand() instead of rand(-9223372036854775808, 9223372036854775807) you will get even distribution again.

Here's a script to demonstrate the effects...

function unevenRandDist() {

    $r = [];
    for ($i = 0; $i < 10000; $i++) {
            $n = rand(-9223372036854775808,9223372036854775807);
            if (isset($r[$n])) {
                    $r[$n]++;
            } else {
                    $r[$n] = 1;
            }
    }
    arsort($r);
    // you should see 0 well above average in the top 10 here
    var_dump(array_slice($r, 0, 10));

}

function evenRandDist() {

    $r = [];
    for ($i = 0; $i < 10000; $i++) {
            $n = rand();
            if (isset($r[$n])) {
                    $r[$n]++;
            } else {
                    $r[$n] = 1;
            }
    }
    arsort($r);
    // you should see the top 10 are about identical
    var_dump(array_slice($r, 0, 10)); //

}

unevenRandDist();
evenRandDist();

Sample Output I Got

array(10) {
  [0]=>
  int(5005)
  [1]=>
  int(1)
  [2]=>
  int(1)
  [3]=>
  int(1)
  [4]=>
  int(1)
  [5]=>
  int(1)
  [6]=>
  int(1)
  [7]=>
  int(1)
  [8]=>
  int(1)
  [9]=>
  int(1)
}
array(10) {
  [0]=>
  int(1)
  [1]=>
  int(1)
  [2]=>
  int(1)
  [3]=>
  int(1)
  [4]=>
  int(1)
  [5]=>
  int(1)
  [6]=>
  int(1)
  [7]=>
  int(1)
  [8]=>
  int(1)
  [9]=>
  int(1)
}

Notice the inordinate difference in the number of times 0 shows up in the first array vs. the second array. Even though technically they are both generating random numbers within the same exact range of PHP_INT_MIN to PHP_INT_MAX.


I guess you could blame PHP for this, but it's important to note here that glibc rand is not known for generating good random numbers (regardless of crypto). This problem is known in glibc's implementation of rand as pointed out by this SO answer

Community
  • 1
  • 1
Sherif
  • 11,786
  • 3
  • 32
  • 57
3

I took a quick look at your script and ran it through the command line. The first thing I had noticed is that because I was running a 32-bit version of PHP, my Integer Minimum and Maximum were different from yours.

Because I was using your original values, I was actually getting 0 100% of the time. I resolved this by modifying the script like so:

$countedZero = 0;
$totalGen = 1000000;

for ($i = 1; $i <= $totalGen; $i++) {
    $rand = rand(~PHP_INT_MAX, PHP_INT_MAX);

    if ($rand === 0) {
        //echo $i . ": <font color='red'>" . $rand . "</font><br/>";

        $countedZero++;
    } else {
        //echo $i . ": " . $rand . "<br/>";
    }
}
echo "0 was generated " . $countedZero . "/" . $totalGen . " times which is " . (($countedZero / $totalGen) * 100) . "%.";

I was able to confirm that each test would yield just shy of a 50% hit rate for 0.

Here's the interesting part, though:

$rand = rand(~PHP_INT_MAX+1, PHP_INT_MAX-1);

Altering the range to these values causes the likelihood of zero coming up to plummet to an average of 0.003% (after 8 tests). The weird part was that after checking the value of $rand that was not zero, I was seeing many values of 1, and many random negative numbers. No positive numbers greater than 1 were showing up.

After changing the range to the following, I was able to see consistent behavior and more randomization:

$rand = rand(~PHP_INT_MAX/2, PHP_INT_MAX/2);

Here's what I'm pretty sure is happening:

Because you're dealing with a range here, you have to take into account the difference between the minimum and the maximum, and whether or not PHP can support that value.

In my case, the minimum that PHP is able to support is -2147483648, the maximum 2147483647, but the difference between them actually ends up being 4294967295 - a much larger number than PHP can store, so it truncates the maximum in order to try to manage that value.

Ultimately, if the difference of your minimum and maximum exceeds the PHP_INT_MAX constant, you're going to see unexpected behavior.

maiorano84
  • 11,574
  • 3
  • 35
  • 48
  • Yes, 64 BIT install has higher number values as I researched it before actually posting this question and also put it in my OP to make it clear. I too noticed when skimming through the numbers generated for me that positives were missing. I'll give your idea a go with my numbers (`rand(~PHP_INT_MAX/2, PHP_INT_MAX/2);`. – Script47 Oct 26 '15 at 03:52
  • After testing your theory, it seems that you are correct. I tested it twice on `250` and it generated `0` `0` times and then pushed it up to `250,000` and it again generated `0` `0` times. After reading your answer I understand why it shows negatives only, but still I'm not sure why so many `0`'s. I would've thought that it would be only negatives considering the range? I +1'd your answer but I'm going to hold off accepting it because I'm not fully convinced that these 0's are just the result of unexpected behaviour. If I get no other responses, I will accept this as the answer. – Script47 Oct 26 '15 at 03:58