1

It is well known that pseudo-random numbers are not cryptographically secure.

An extremely basic way I can think of generating a pseudo-random number could be to get the time-stamp at the time the code runs and return the lowest significant figures.

For example the outcome of import time; return time.time_ns/100 % 1000 returns a number between 0 and 1000 that should be almost impossible to predict unless you know exactly the time at which the code run (with a nanosecond precision) and all the overhead execution times of the code.

We could then use one or more numbers generated this way to run a chaotic function (as a logistic map) to generate number that should be extremely hard to predict.

One extremely naïve implementation could be:

import time

def random():
    return time.time_ns()/100 % 1000 / 1000

def logistic():
    r = 3.9 + random()/10
    N = 1000 + int(random()*100)
    x = random()
    
    for _ in range(N):
        x = r*x*(1-x)

    return x

print(logistic())

However, I'm quite sure that no one would consider this to be cryptographically secure. Why is this? How could one predict/attack such method?

EDIT: My question is a theoretical question to understand the reasons why building a true RNG is so difficult and hard. I would never use a RNG I wrote in a real code, even less if it had to be cryptographically secure. However, it would be nice to have a bit more details on WHY it's hard to achieve such result so that hundreds/thousands of researcher spent their life working on this topic.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Luca
  • 1,610
  • 1
  • 19
  • 30
  • It's not just the seed, but also the algorithm. Related: https://stackoverflow.com/questions/68081216/why-using-a-time-based-pseudo-random-number-is-not-cryptographically-secure ; https://stackoverflow.com/questions/62375129/how-to-get-truly-random-data-not-random-data-fed-into-a-prng-seed-like-csrngs – Peter O. Jun 22 '21 at 09:57
  • Not a binding answer, but consider that even if all the numbers were completely independent, you only drew about 20 bits of randomness. But they aren't even independent, because the duration between them means they will be relatively close together. As a naive test, ``{time.time_ns()/100 % 1000 for _ in range(3)}`` has all results in a range of only at most 60 on my machine. – MisterMiyagi Jun 22 '21 at 09:57
  • @MisterMiyagi I agree... but shouldn't the logistic map generate very different results for relatively close inputs? (that's the definition of chaotic function) – Luca Jun 22 '21 at 09:58
  • @Luca That strongly depends on what is known to an attacker, and what the scenario is. Usually, you would assume everything but the random numbers is compromised. A naive attack would be to just run the generating function with likely inputs – of which you assume to be 1000*1000*1000 but due to the chosen "randomness" it's closer to 1000*5. So your approach is 200000 times less resilient against a naive brute force attack than with proper randomness. – MisterMiyagi Jun 22 '21 at 10:05
  • Related: https://stackoverflow.com/questions/67019302/security-issues-with-generating-passwords-in-python/67019485#67019485 – Peter O. Jun 22 '21 at 10:22
  • *...should be almost impossible to predict...* what if your attacker is not interested in predicting, he's just going to try all the possible values to see which one works? – President James K. Polk Jun 22 '21 at 13:55

2 Answers2

3

It is well known that pseudo-random numbers are not cryptographically secure.

Is it really? All cryptographic systems I know do use pseudo random generators. There are two major point for a cryptographically secure pseudo-random sequence:

  • the probability of any value should be the same (as much of possible) to keep entropy high. If on a 16 bit number the generation algorithm consistently sets 8 bits to zero, you only have a 8 bit generator...
  • knowledge of a number of consecutive values shall not allow to predict the next one(s) - this one is really the tricky part...

Relying on the nano-seconds part of the time blindly assumes that the internal clock of the system will not have prefered values for the low order bits... and nothing ensures that!

Common systems only rely on the hardware randomness of the time to build a seed.

And when it comes to security and cryptography to rule is do not roll your own unless you are an established specialist (and if you are, you already know that any new algorithm or implementation should be carefully reviewed by peers). Hell is always hidden in the details, and something that looks very clever at first sight could introduce a major flaw. Here relying on randomness of the system clock is not secure.

The fact is that building good algorithms and implementations is very hard. And having others to trust them takes even more time. There is nothing bad in experimenting new ideas, and studying how they are validated is even more interesting and you would learn a lot. But my advice is to not use you brand new algo for anything else than tests, and never for mission critical operations.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks for the first part of your answer, but just to be clear **I'm not thinking of building my own RNG**. As you said, I'd never do it. My question was more a theoretical question to understand the reasons why building a true RNG is so difficult and hard (hence, the second part of your answer about the fact that it's hard and you should not do it is a bit pleonastic). However, it would be nice to have a bit more details on WHY it's hard, that was the real scope of my question – Luca Jun 22 '21 at 10:21
  • 2
    @Luca: I am sorry I have missed your point, but a number of beginners actually think that crypto is easy and try to use they own super secret algo instead of a well known implementation. Hence my answer. If you want to go further, I can remember that building a nice algo heavily uses mathematics and group theory. And I have not played with those for a too long time now... Maybe you could try to search some posts on [Cryptography](https://crypto.stackexchange.com/) to get *starting points*... – Serge Ballesta Jun 22 '21 at 10:35
2

For cryptography, it is not only desirable that individual numbers are hard to predict but also that multiple numbers are hard to predict – that is, numbers should (appear to) be independent.
Notably, they should be independent even if an attacker knows the algorithm.

That is problematic with time based "randomness", since by design the next time is after the previous time. Worse, there is a direct relation "how much" one number is after the other – namely how much time has elapsed since fetching the previous number.
For example, drawing numbers in any predictable manner such as a loop gives a significant correlation between numbers:

>>> import time
>>>
>>> def random():
...     return time.time_ns()/100 % 1000
...
>>> # some example samples
>>> [random() for _ in range(5)]
[220.0, 250.0, 250.0, 260.0, 260.0]
>>> [random() for _ in range(5)]
[370.0, 390.0, 400.0, 410.0, 410.0]
>>> # how much spread there is in samples
>>> import statistics
>>> [statistics.stdev(random() for _ in range(5)) for _ in range(5)]
[16.431676725154983, 11.40175425099138, 11.40175425099138, 8.366600265340756, 11.40175425099138]
>>> [statistics.stdev(random() for _ in range(5)) for _ in range(5)]
[16.431676725154983, 11.40175425099138, 11.40175425099138, 11.40175425099138, 11.40175425099138]

Notably, the critical part is actually not that the spread is low but that it is predictable.

The other problem is that there really is not much that can be done about it: The "randomness" contained in time is inherently limited.

  • On one end, time randomness is constrained by resolution. As can be seen by the output, my system clock only has sufficient resolution for multiples of 10 – so it can only draw 100 distinct numbers.
  • On the other end, time randomness is constrained by the program duration. A program that, say, draws during 1 ms only can draw randomness from that duration – that's only 1 000 000 distinct nanoseconds.

Notably, no algorithm can increase that randomness – one can shift it or even it out across some range, but not create more randomness from it.


Now, in practice one could say that it is still somewhat difficult to predict the values drawn this way. The point however is that other means are more difficult to predict.

Consider that some attacker knew your algorithm and tried to brute-force a result:

  • With a true random generator for actual 3*[0, 1000) they must try 1000*1000*1000 numbers.
  • With the time based random generator for seemingly 3*[0, 1000) they must try roughly 100*5 numbers (low resolution, low spread).

That means time based randomness makes the approach 2_000_000 times less robust against a brute force attack.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119