14

I have been using arc4random() and arc4random_uniform() and I always had the feeling that they wasn't exactly random, for example, I was randomly choosing values from an Array but often the values that came out were the same when I generated them multiple times in a row, so today I thought that I would use an Xcode playground to see how these functions are behaving, so I first tests arc4random_uniform to generate a number between 0 and 4, so I used this algorithm :

import Cocoa

var number = 0

for i in 1...20 {
    number = Int(arc4random_uniform(5))
}

And I ran it several times, and here is how to values are evolving most of the time :
enter image description here enter image description here

So as you can see the values are increasing and decreasing repeatedly, and once the values are at the maximum/minimum, they often stay at it during a certain time (see the first screenshot at the 5th step, the value stays at 3 during 6 steps, the problem is that it isn't at all unusual, the function actually behaves in that way most of the time in my tests.

Now, if we look at arc4random(), it's basically the same :
enter image description here enter image description here

So here are my questions :

  • Why is this function behaving in this way ?
  • How to make it more random ?

Thank you.

EDIT :
Finally, I made two experiments that were surprising, the first one with a real dice :
enter image description here
What surprised me is that I wouldn't have said that it was random, since I was seeing the same sort of pattern that as described as non-random for arc4random() & arc4random_uniform(), so as Jean-Baptiste Yunès pointed out, humans aren't good to see if a sequence of numbers is really random.

I also wanted to do a more "scientific" experiment, so I made this algorithm :

import Foundation

var appeared = [0,0,0,0,0,0,0,0,0,0,0]
var numberOfGenerations = 1000

for _ in 1...numberOfGenerations {
    let randomNumber = Int(arc4random_uniform(11))
    appeared[randomNumber]++
}

for (number,numberOfTimes) in enumerate(appeared) {
    println("\(number) appeard \(numberOfTimes) times (\(Double(numberOfGenerations)/Double(numberOfTimes))%)")
}

To see how many times each number appeared, and effectively the numbers are randomly generated, for example, here is one output from the console :
0 appeared 99 times.
1 appeared 97 times.
2 appeared 78 times.
3 appeared 80 times.
4 appeared 87 times.
5 appeared 107 times.
6 appeared 86 times.
7 appeared 97 times.
8 appeared 100 times.
9 appeared 91 times.
10 appeared 78 times.

So it's definitely OK

EDIT #2 : I made again the dice experiment with more rolls, and it's still as surprising to me :
enter image description here

Pop Flamingo
  • 3,023
  • 2
  • 26
  • 64
  • Check this: http://stackoverflow.com/questions/56648/whats-the-best-way-to-shuffle-an-nsmutablearray – Mrunal Jan 07 '15 at 06:41
  • @Mrunal Aren't they using the same functions than I do? – Pop Flamingo Jan 07 '15 at 06:43
  • @Mrunal My problem isn't about how to replace elements in an array, just how to generate really random numbers. :) – Pop Flamingo Jan 07 '15 at 06:43
  • Oh ok... Random numbers can be in repetitive manner. It requires to handle on developer end. Check this now: http://stackoverflow.com/questions/1617630/non-repeating-random-numbers – Mrunal Jan 07 '15 at 06:47
  • 3
    A sequence of random numbers can contain a subsequence of repetitive numbers... – Jean-Baptiste Yunès Jan 07 '15 at 06:51
  • Your question is unclear to me. You claim that *"once the values are at the maximum/minimum, they often stay at it during a certain time"*. But the repeated value 3 in picture #1 is not the "maximum" value (and there will always be repeated values, as Jean-Baptiste said). – And what is *"basically the same"* in picture #3 and #4? I cannot see anything unusual there. – Martin R Jan 07 '15 at 07:10
  • 1
    If you tossed a coin and got alternating heads and tails, would you consider that "random enough"? – jrturton Jan 07 '15 at 07:28
  • @jrturton To you mean something like : Tail, Head, Tail, head, tail, head... ? If yes, no, but would I would NOT considered as random would be if I was getting a dozen times head, then a dozen times tail, and so on... all the time. – Pop Flamingo Jan 07 '15 at 07:35
  • @MartinR Basically, what I don't like about this function is that it looks like most of the numbers that are generated are close to the "generation bounds", if I say that I want a number between 0 and 10, it looks like most of the numbers will be close to 0 or 10, but never that much close to the "middle". – Pop Flamingo Jan 07 '15 at 07:37
  • If you generated random numbers for long enough you could expect to see that. I was making the point that seeing repeated values _is expected_ with randomness. It doesn't take previous values into account, so each subsequent value is equally probable. – jrturton Jan 07 '15 at 07:39
  • @jrturton Yes I understand, but I meant if it was always like this. – Pop Flamingo Jan 07 '15 at 07:44
  • 1
    This is a visual artifact! Try to use another kind of visualization, random points in a circle for example... You will be able to observe more easily that the distribution is "uniform". In your "1D" plotting, just try to swap between values for example take a graph and swap the line for value 6 with line for value 0, you will observe that the graph is almost the same! – Jean-Baptiste Yunès Jan 07 '15 at 11:02

3 Answers3

11

A true random sequence of numbers cannot be generated by an algorithm. They can only produce pseudo-random sequence of numbers (something that looks like a random sequence). So depending on the algorithm chosen, the quality of the "randomness" may vary. The quality of arc4random() sequences is generally considered to have a good randomness.

You cannot analyze the randomness of a sequence visually... Humans are very bad to detect randomness! They tend to find some structure where there is no. Nothing really hurts in your diagrams (except the rare subsequence of 6 three in-a-row, but that is randomness, sometimes unusual things happens). You would be surprised if you had used a dice to generate a sequence and draw its graph. Beware that a sample of only 20 numbers cannot be seriously analyzed against its randomness, your need much bigger samples.

If you need some other kind of randomness, you can try to use /dev/random pseudo-file, which generate a random number each time you read in. The sequence is generated by a mix of algorithms and external physical events that ay happens in your computer.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
  • When you told me that I would be surprised if I has used a dice... well that's exactly what I just done, and yes, I've been surprised : http://imgur.com/DZHfTUs If I had seen this graph and didn't knew it was a dice, I would have said that it was a bad random generator. Anyway, I'm still surprised that in my random generations, the values seem to be close to the maximum and the minimum of the range of numbers that arc4random can generate, even if I don't get exactly what it means, I might be searching like the discrepancy sequence Jeffrey Thomas is mentioning. – Pop Flamingo Jan 07 '15 at 07:31
  • I made an algorithm to see how many times each number appears, and that's definitely random ! For example, for 1000 generations of numbers between 0 and 10, I had : 0 appeard 101 times (9.9009900990099%) 1 appeard 87 times (11.4942528735632%) 2 appeard 79 times (12.6582278481013%) 3 appeard 90 times (11.1111111111111%) 4 appeard 95 times (10.5263157894737%) 5 appeard 106 times (9.43396226415094%) 6 appeard 90 times (11.1111111111111%) 7 appeard 84 times (11.9047619047619%) 8 appeard 78 times (12.8205128205128%) 9 appeard 106 times (9.43396226415094%) 10 appeard 84 times (11.9047619047619%) – Pop Flamingo Jan 07 '15 at 07:55
  • 2
    Be really careful, testing against randomness it a true challenge! May I suggest you to read http://en.wikipedia.org/wiki/Randomness_tests. There is a lot of software suite to test this... – Jean-Baptiste Yunès Jan 07 '15 at 09:37
4

It depends on what you mean when you say random.

As stated in the comments, true randomness is clumpy. Long strings of repeats or close values are expected.

If this doesn't fit your requirement, then you need to better define your requirement.

Other options could include using a shuffle algorithm to dis-order things in an array, or use an low-discrepancy sequence algorithm to give a equal distribution of values.

Jeffery Thomas
  • 42,202
  • 8
  • 92
  • 117
  • I am not getting exactly what a low-discrepancy sequence is, is it that if I had used such a function in my algorithm, I would have got really fast as many 0, 1, 2, 3 and 4 ? (for my arc4random_uniform(5)) ? – Pop Flamingo Jan 07 '15 at 07:33
  • 3
    @TrevörAnneDenise low-discrepancy sequence has the look of random numbers, but each value is carefully picked to avoid clumpiness. To humans, it looks more random than actual randomness. – Jeffery Thomas Jan 07 '15 at 14:08
0

I don’t really agree with the idea of humans who are very bad to detect randomness. Would you be satisfied if you obtain 1-1-2-2-3-3-4-4-5-5-6-6 after throwing 6 couples of dices ? however the dices frequencies are perfect…

This is exactly the problem i’m encountering with arc4random or arc4random_uniform functions. I’m developing a backgammon application since many years which is based on a neural network trained by word champions players. I DO know that it plays much better than any one but many users think it is cheating. I also have doubts sometimes so I’ve decided to throw all dices by myself…

I’m not satisfied at all with arc4random, even if frequencies are OK. I always throw a couple of dices and results lead to unacceptable situations, for example : getting five consecutive double dices for the same player, waiting 12 turns (24 dices) until the first 6 occurs.

It is easy to test (C code) :

void randomDices ( int * dice1, int * dice2, int player )
{
    ( * dice1 ) = arc4random_uniform ( 6 ) ;
    ( * dice2 ) = arc4random_uniform ( 6 ) ;

    // Add to your statistics
    [self didRandomDice1:( * dice1 ) dice2:( * dice2 )  forPlayer:player] ;
}

Maybe arc4random doesn’t like to be called twice during a short time…

So I’ve tried several solutions and finally choose this code which runs a second level of randomization after arc4random_uniform :

int CFRandomDice ()
{
    int __result = -1 ;

    BOOL __found = NO ;

    while ( ! __found )
    {
        // random int big enough but not too big
        int __bigint = arc4random_uniform ( 10000 ) ;

        // Searching for the first character between '1' and '6'
        // in the string version of bigint :

        NSString * __bigString = @( __bigint ).stringValue ;

        NSInteger __nbcar = __bigString.length ;
        NSInteger __i = 0 ;

        while ( ( __i < __nbcar ) && ( ! __found ) )
        {
            unichar __ch = [__bigString characterAtIndex:__i] ;
            if ( ( __ch >= '1' ) && ( __ch <= '6' ) )
            {
                __found = YES ;
                __result = __ch - '1' + 1 ;
            }
            else
            {
                __i++ ;
            }
        }
    }

    return ( __result ) ;
}

This code create a random number with arc4random_uniform ( 10000 ), convert it to string and then searches for the first digit between ‘1’ and ‘6’ in the string.

This appeared to me as a very good way to randomize the dices because : 1/ frequencies are OK (see the statistics hereunder) ; 2/ Exceptional dice sequences occur at exceptional times.

10000 dices test:
----------
Game Stats
----------
HIM :
Total 1 = 3297
Total 2 = 3378
Total 3 = 3303
Total 4 = 3365
Total 5 = 3386
Total 6 = 3271
----------
ME :
Total 1 = 3316
Total 2 = 3289
Total 3 = 3282
Total 4 = 3467
Total 5 = 3236
Total 6 = 3410
----------
HIM doubles = 1623
ME doubles = 1648

Now I’m sure that players won’t complain…

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
  • So you're pinning your intuition against all the mathematical theory and empirical statistical testing backing `arc4random`? Disagreeing with the observation that [humans are bad at judging randomness](https://quotulatiousness.ca/blog/2013/02/20/people-are-terrible-judges-of-randomness-that-is-why-we-invented-statistics/) is an opinion, not a fact-based statement. It's been very well [documented](http://cocosci.princeton.edu/tom/papers/LabPublications/WilliamsPeopleBadRandom.pdf). – pjs Jul 03 '19 at 16:18