0

Is the function Random in programming languages not biased? After all, the algorithms need to be based on something, and that can generate bias. According to this website https://mathbits.com/MathBits/CompSci/LibraryFunc/rand.htm, rand() function needs a start number, called seed. It states that

the rand( ) function generates a pseudo-random sequence

I don't completely understand the logic behind it. If it's not really random (pseudo-random), is there a way to make it perfectly random?

B-M
  • 1,248
  • 9
  • 19
  • 3
    See this: https://www.random.org/randomness/ – Elias Soares Jul 27 '19 at 17:59
  • 1
    Pseudorandom number generators can be biased. [RANDU](https://en.wikipedia.org/wiki/RANDU) was quite broken. The most common prng today is Mersenne Twister which is good for most uses, but it's not considered cryptographically secure. – Håken Lid Jul 27 '19 at 18:42
  • 1
    @EliasSoares That's quite informative. Also, from the same source, this is very interesting: https://www.random.org/analysis/#visual – B-M Jul 27 '19 at 18:46
  • So, the True Random Number Generators are really random? – B-M Jul 27 '19 at 18:52
  • 1
    Any sequence of values which is repeatable on demand is not random, it is deterministic (by definition, since it can be repeated). Pseudo-random number generators produce deterministic sequences. Good ones produce sequences which can't be distinguished from real randomness using statistical tests that evaluate various properties which a sequence of identically uniformly distributed values sampled independently should exhibit. – pjs Jul 27 '19 at 19:08
  • However, if you know the deterministic algorithm being used in a PRNG and can infer the state values being manipulated, you can reproduce the sequence at will. – pjs Jul 27 '19 at 19:14
  • I understand. So, let’s say that if someone wants to generate votes for a presidential election and uses PRNG to generate the SIN number of people, that could be detected by authorities if they do some statistical tests or use the same PRNG algorithm to reproduce the sequence? – B-M Jul 27 '19 at 23:55

2 Answers2

2

By definition, functions map a given input to a given output. For a pseudorandom generator, that means it maps a given "seed" to a given sequence of random-looking numbers. For such a generator to even begin to generate "random" numbers, the seed has to have some randomness itself. And there are many sources of the seed for this purpose, including—

  • high resolution timestamps,
  • timings of input devices,
  • thermal noise,
  • atmospheric noise, and
  • combinations of two or more of the sources above.

Also, in general, the longer the seed is, the greater the variety of "random" sequences a pseudorandom generator can produce.


Different pseudorandom number generators (PRNGs) have different qualities. If a particular PRNG is itself "bad", no seed selection strategy can make it "better". The choice of random number generator (RNG) will depend on what kind of application will use the random numbers, and you didn't really specify what kind of application you have in mind:

  • If the random numbers are intended to further information security in any way (e.g., they serve as random encryption keys, nonces, or passwords), then only a cryptographic RNG will do.
  • If the "random" numbers have to be reproducible at a later time, then a seeded PRNG of high quality has to be used. There are several good and bad choices for such a PRNG. In this sense, the rand function in C uses an unspecified algorithm, which hampers the goal of reproducible "randomness".
Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • So, according to the website provided in the comments, "TRNGs extract randomness from physical phenomena and introduce it into a computer.". What you described is still pseudorandom? – B-M Jul 27 '19 at 18:50
  • 1
    In general, a TRNG observes a physical phenomenon and extracts bits (zeros and ones) from it. In general, however, such bits are not necessarily uniformly distributed (e.g., there may be very long runs of zeros or ones in those bits), so usually a second step called _randomness extraction_ processes those bits to bring them to a uniform distribution. Examples of this include cryptographic hash functions and von Neumann unbiasing. Randomness extraction is a deterministic algorithm that behaves the same way for a given sequence of bits. – Peter O. Jul 27 '19 at 18:55
  • TRNGs can be the _source_ of a seed for a pseudorandom generator, or can produce random numbers themselves. – Peter O. Jul 27 '19 at 19:00
  • 1
    You're conflating seed and state, they can be the same for simple generators such as LCGs but in general they're not the same thing. See [this](https://stackoverflow.com/a/48505649/2166798) for details. Also note that we assess PRNGs by how well the sequences of numbers they produce mimic true randomness. PRNGs are deterministic functions of their state and will cycle when they hit a duplicate state, so a seed is an entry point to the cycle, not a source of randomness. – pjs Jul 28 '19 at 17:03
1

Historically, the pseudo-random number functions in most programming languages have been bad. Old algorithms running on deterministic machines produced less than perfect results.

But things are changing. All modern microprocessors have hardware-based entropy generation functions, and modern applications like online banking have driven the development of better algorithms. It totally depends on the OS, language, and library you have in mind. There are very good options, but you have to know what they are, because the bad options are still around.

Something like C language's rand() is probably the worst. Getting bytes from Linux (or MacOS) /dev/random is very good. Cryptography libraries have good algorithms. It also depends on the application--for cryptography, you need very good quality random numbers. For something like Monte Carlo integration, you need lots of numbers quickly but not necessary perfect entropy--something like a PRNG seeded by /dev/random would be just fine.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55
  • 1
    For Monte Carlo you generally don't want to seed with `/dev/random` because reproducibility can play an important role. It's a necessity for many "[variance reduction](https://en.wikipedia.org/wiki/Variance_reduction)" techniques. In recent years reproducibility is also becoming a requirement for publication in scientific journals. This also ties in with a comment I made on Peter O.'s post -- valid distributional properties are a function of a PRNG's algorithm, not the seed. – pjs Jul 28 '19 at 18:14