0

I came across this function randint() in Python that gives you a random integer from a list of integers. The thing that I am not able to digest is that how can we really simulate randomness. How can I really tell that a random function that may be in any programming language doesn't give a biased result? BellCurve?

How can we simulate something that is so natural as randomness? We can just calculate the probability of a result that can appear. But can never tell how this works.

For simulating something we need complete insight of the topic, don't we?

Community
  • 1
  • 1
brainst
  • 291
  • 1
  • 4
  • 15
  • 1
    This is more of a physics question. You *don't* have true randomness when using rand() libraries in most languages – Idos Feb 20 '16 at 09:04
  • 2
    You can't _simulate_ randomness because then it's not random anymore. What you _could_ do (although it's more complicated) is measure truly random events. I believe at random.org they use atmospheric noise and some other ones use some property of the electrical currents in the CPU. – Arc676 Feb 20 '16 at 09:07
  • True random generators are hardware only: http://www.idquantique.com/random-number-generation/quantis-random-number-generator/ – Andrey Smorodov Feb 20 '16 at 09:07
  • 2
    Why did you mention bell-curve here, if I may ask? Bell-curve applies to normal distribution. Does randomness have to be normal? – Sнаđошƒаӽ Feb 20 '16 at 09:23
  • @Shadowfax well I was just trying to make my question clear .. Considering my limited knowledge :D ... Please edit the question if you think it could be explained better.. Thanks! – brainst Feb 20 '16 at 10:05

4 Answers4

3

Quick Answer:


        Entropy is introduced into computer algorithms to kick off a deterministic sequence of numbers which are generated by clever computer algorithms, then fitted to the distribution curve the user asked for e.g. rand(1,10) could internally produce numbers from 0.0 to 0.9999 but needs to map to 1 through 10.. Because it is difficult to know what that Entropy was, it makes determining the numbers more difficult, thus the pseudo-random generator description. In Statistics we learn that flipping a coin 100 times may not result in 50 Heads and 50 Tails, nor should it as coin flips don't work that way. However, it's a good example. The probability works assuming an infinite number of iterations along an even distribution. It is possible Heads will show 100 times, 1000 times, 10,000 times in a row. It's possible, not likely but possible. An algorithm which simulates randomness is under no obligation to ensure that a 0 is returned IF a 0 was among the list of possible answers. It only needs to ensure it's possible.

General Answer


Most computer generated random numbers are pseudo-random.

        As you've eluded to in your question, computers cannot simulate true randomness; all random algorithms generate random numbers deterministically; meaning if one knows the initial seed of the algorithm, the entropy used by the algorithm, and which iteration the algorithm was in, the 'random' number can be determined. True randomness can only be achieved by observing the outcome of a random event, which may be the physical nature of computer components or other phenomena.

        One could argue that natural randomness is not in reality random but just an unknown sequence of events. Not unknowable (i.e. Entropy), but just currently unknown. It is only unknown (random) because we are unable to explain or predict it, at this time, due to insufficient advancement in technology or knowledge. There is true Chaotic Entropy, but unless we're talking about Quantum Computers, it doesn't matter. For the purposes of most applicable software applications a very good Pseudo RNG is all that's required.

        Given a period of time 1,000 years ago we could say the ocean was prone to random Tsunami's. Now we have more advance technology and understanding that we can create prediction models. These prediction models become more accurate as we enter more information about all the events which would lead to a Tsunami.

        The part that is difficult for a computer to simulate is Entropy. Entropy is, at it's simplest, randomness. When generating a tuple of prime numbers often times the algorithm being used to create a series of random numbers will collect Entropy from outside sources; moving your mouse, electrical noise, 'noise' collected from an antenna such as the built in WiFi or Bluetooth. Entropy is the key to creating a good set of simulate random numbers.

        Even with all the advancements we have in collecting entropy, a machine can still be induced to generate a specific set of entropy, which then would allow an attacked to accurately predict the numbers being generated. If the algorithm collects noise from a microphone, they can create a loud and predictable noise at the right time in order to influence the sequence of numbers which will be generated later. The same can be said of all the other forms of gathering Entropy.

A simple way to get true randomness is to use Random.org.

The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs.

Community
  • 1
  • 1
Rikka
  • 999
  • 8
  • 19
1

Since you're going to simulate randomness, you are going to end up using pseudorandom number generator. The topic is widely covered. PRNG.

Python's random() already uses the Mersenne twister. And my guess is that you do not want anything better than that, unless you're working on some cryptography tool.

Now, if you want to get a truly random signal, it has to have a physical nature (e.g. Geiger's counter). But in most cases you do not need to go this far.

The answer to your question depends hugely on the purpose of the randomness in your application.

archived
  • 647
  • 8
  • 24
0

I'll start by asking what does it mean to be random? The word is a shorthand for outcomes that are a priori unpredictable. You can attempt to quantify the degree of unpredictability using measures such as entropy, but randomness itself is a binary state: Either an event can be predicted with certainty (entropy = 0), or it is random. Different probability distributions such as the bell curve (normal) or uniform have different amounts of entropy, but they all qualify as random because their entropy is non-zero—you can't predict the outcomes with certainty.

Most programming languages implement some type of Pseudo-Random Number Generator (PRNG). These are deterministic algorithms which use chaotic behavior to mimic the unpredictability of randomness. If you know the algorithm being applied and the initial state, you can predict the outcomes of a PRNG with absolute certainty. However, we can take inspiration from Alan Turing's "Imitation Game." Imagine you have two black box sources of numbers, one of which contains a PRNG (but you don't know anything about its initial state) while the other contains a source of "real" randomness (whatever that means). If you're permitted to apply any test you can think of, and you can't tell which is which within the scope of the sample you plan to use in your computer program, does it matter which one you use?

How can you tell that the PRNG is okay to use? Basically it boils down to trusting that the people who designed the algorithm knew what they were doing, and that the implementation holds up well against a battery of tests specifically intended to catch PRNGs in identifiably non-random behavior, such as Marsaglia's Diehard tests or the more recent Dieharder suite, or those available from NIST.

pjs
  • 18,696
  • 4
  • 27
  • 56
  • Cryptographic quality PRNGs (CPRNGs) found in a great many computer systems have a non-deterministic component such as interaction from outside the system and inside such as air perturbations in a spinning hard drive. Thus it is not true that you can predict the outcomes of a CPRNG with absolute certainty. – zaph Feb 20 '16 at 16:54
  • @zaph That's why I discussed it in terms of PRNGs. – pjs Feb 20 '16 at 17:03
  • We are not limited to PRNGs, it is more common to use supplied CPRNGs, and very little reason to revert to PRNGs. WRT the question CPRNGs are a better answer to the question than a PRNG. – zaph Feb 20 '16 at 17:39
  • @zaph Gotta disagree with you on that. My reading of the question was that it was asking how we can justify using deterministic functions as sources of randomness. Hence I gave my answer in terms of PRNGs. CPRNGs add non-zero entropy to the mix, and thus are a sideshow relative to what I perceive the OP's question to be. If you disagree feel free to compose your own answer involving CPRNGs. – pjs Feb 20 '16 at 17:52
  • @zaph By the way, note that the question specifically asked about the use of Python's `randint()`. This provides the context for answering in terms of PRNGs rather than CPRNGs -- Python uses Mersenne Twister, which is a PRNG. – pjs Feb 20 '16 at 19:45
0

How can we simulate something that is so natural as randomness?

TL;DR:

  • By knowing what makes that something act "randomly",
  • By being smart and making just the right simplifying assumptions so as to not make the problem too hard,
  • By having good previously-collected statistics so as to know that a statistic model is correct,
  • By having a PRNG that is good enough from which one can simulate that random process, and
  • By having an algorithm that maps outputs from that PRNG to the underlying statistical distribution.

By knowing what makes that something act "randomly"
Radioative decay almost perfectly acts as a Poisson process. Not quite so perfectly, goals scored in a World Cup game can be modeled as a Poisson process. (But it's close enough for Las Vegas to make money.) On the other hand, the outcome of a coin toss is an example of a Bernoulli process. There are lots of different kinds of random processes, and these different random processes lead to different kinds of random distributions. Knowing what is going on underneath the hood is important.

By being smart and making just the right simplifying assumptions
One of the most helpful tools in a modeler's bag of tricks is the central limit theorem. Add lots and lots and lots of random influences together and the end result very often looks gaussian (the "Bell Curve" alluded to in the question). Assuming a gaussian distribution is a nice simplifying assumption, but it can get one in trouble. One has to be smart enough to avoid oversimplifying assumptions.

By having good previously-collected statistics
It took a while before people determined that radioactive decay was indeed a Poisson process. They determined this by having a nice history of previously taken measurements. Without previously collected statistics, all one has is a guess. Guesses are exceptionally good at biting the person who made the guess in the rear.

By having a PRNG that is good enough
There are lots of reasons for using a deterministic pseudo random number generator. That a PRNG isn't quite "random" in the sense that run #12345 of a Monte Carlo simulation can be exactly repeated can be a good thing. If a simulated vehicle blew up or if a simulated patient died in that run of a Monte Carlo simulation, any sane person would want to investigate that case in detail.

Fortunately, there are a number of very good PRNGs out there. Python uses Mersenne twister. While not the best, it is very, very good.

By having an algorithm that maps outputs from the PRNG to the underlying statistical distribution*
You're toast if there's no way to translate the outcome of Mersenne twister (or whatever PRNG you are using) to the distribution at hand. Fortunately, people before us have spent a good deal of time developing algorithms that approximate a wide number of random distributions.

The question was tagged with python, so I'd be remiss to write about python's random package and numpy's random package. The latter is even better than the built-in capabilities one gets for free as a standard python package. It provides a good number of algorithms that convert from the integer output of Mersenne twister (for example) to a wide number of frequently encountered probability distributions. (And in some cases, probability distributions that are only encountered infrequently.)

David Hammen
  • 32,454
  • 9
  • 60
  • 108