4

I am implementing an application in Java and I am using SecureRandom for generating randomness. I need to be able to encrypt 1 billion files. I looked on StackOverflow for the answer of my question but I found no exact answers. Everyone is pretty vague:

  • You don’t need to reseed SecureRandom. It has a “large” period. But what is large?
  • You don’t need to reseed SecureRandom because it is a “well designed" CSPRNG. But what is a well designed CSPRNG’s period?

So I decided to do the math myself and see if anyone can check it for me. What facts do we know about SecureRandom’s current implementation in Java 8? Actually there is some controversy from the sources I found.


What I know about Java's SecureRandom implementation

  • Internally it uses SHA1PRNG when generating randomness via the calls to nextBytes(), nextInt(), nextLong() etc. (see Tersesystems, Cigital).

    The Cigital explanation of what SecureRandom does under the hood is contradicting the official explanation from the Java docs. The Official documentation from Oracle states that NativePRNG, NativePRNGBlocking, NativePRNGNonBlocking and Windows-PRNG always use the native source of randomness instead of Java’s own SHA1PRNG algorithm (also mentioned by this answer and this one). Cigital’s link says that Java always uses SHA1PRNG, but the type of SecureRandom dictates where it is seeded from (/dev/random, /dev/urandom etc.).

    Is SecureRandom always using SHA1PRNG under the hood? This is what I assume in my math calculations so if this is not the case please correct me.

  • The official Oracle documents state that SHA1PRNG truncates its output to 64 bits, from the full 160 bit hash output? Looking at the OpenJDK’s implementation of SecureRandom I see nowhere a truncation of the SHA1 output. Actually it does the opposite: it saves any unused output from the SHA1 hash for future calls to engineNextBytes(). If the official Oracle documents are the main authority on the subject then why OpenJDK’s implementation does something different? This is strange.

  • When nextBytes() is called immediately on the SecureRandom instance, it is seeded automatically by the operating system’s CSPRNG (/dev/random in Linux/Unix and CryptGenRandom in Windows) or when those are not available by custom made entropy generators like ThreadLocalRandom. But ThreadLocalRandom provides very low and slow entropy so it should be avoided when possible.

  • Nowhere did I see evidence of automatic reseeding functionality in SecureRandom. The Fortuna CSPRNG, as explained in Cryptography Engineering, has an elaborate mechanism of reseeding and I would expect that any modern CSPRNG would abstract that logic from the client and handle it automatically. I am not sure why there is such a lack of information and understanding about CSPRNG reseeding. It needs to happen if you exhaust the period of the CSPRNG. There is no doubt about it. The question is what is the period and is it a real concern in Java.


Doing the math

Due to the birthday attack we know that we should generally get a collision when we produce half the output size of the internal hash number of outputs. We know that the internal hash is truncated to 64 bits. This means that we can at most generate 2^32-1 rounds of 64 bit randomness before we have a 50% chance of collision. This will be the time we want to reseed.

(2^32-1) * 64 = 274,877,906,880 bits of randomness per seed of SecureRandom

Converted to bytes we get

(2^32 - 1) * 64 / 8 / (1024 * 1000 * 1000 ) = ~33.55 GBytes of randomness per seed of SecureRandom.

This means that we can generally expect around 33.55GB of high quality randomness generated by SecureRandom’s SHA1PRNG before it needs to be reseeded with a strong seed.

Is my math correct?

Why this matters? Because my use case is encrypting 1 billion files. Each one needs an IV (128 bits) and a Key (256 bits) or 384 bits. This comes out to 46.875GB of randomness needed in the best case. Because of this I will exhaust the period of SecureRandom if that period is only 33.55GB. A reseeding strategy will be needed in this case, which I’ll post as a separate question.

My issue with collisions and SecureRandom

I have a lot to learn about cryptography so my understanding is improving all the time. I didn’t find any information about collisions being a problem for CSPRNG, but guessing the internal state of the generator is detrimental (see Wiki and Schneir’s cryptanalysis paper on CSPRNG).

Here is my train of thought for collisions, though. I generate a table of (2^32-1) random, unique 64 bit values (this comes out to be ~33.55GB so pretty achievable). Then I have 50% chance of guessing any 64 bit output of SecureRandom. An IV/Key, for example, is comprised of 2 x 64 =128 bits. This means I can guess the first 64 bits of the IV/Key with 50% probability and the second 64 bits with 50% probability, without needing to know the internal state of the CSPRNG. The combined probability of guessing the full key will be less than 50%, but much more than the negligible probability we are looking at when working with such cryptographic primitives. This can lead to generating weak keys which is what I am trying to avoid. In other words, I believe that 64 bit output of a CSPRNG is too small for serious cryptographic work, solely based on the collision properties of a 64 bit output (because of the birthday attack). If, on the other hand, SecureRandom was using a 512 bit SHA-2 hash and truncated it to 256 bit output then the collision attack I am theorizing here would be impossible due to the sheer size of the key space. Am I making sense or I got all of this wrong?

Community
  • 1
  • 1
Stan Ivanov
  • 299
  • 2
  • 7
  • 2
    Why do you worry about a collision? The next bytes will be different, right? You need to consider the internal state of the algorithm rather than the output size. Note that this question is better asked at https://crypto.stackexchange.com rather than SO, even though it talks about a Java specific RNG. Note that the SHA1PRNG has changed implementation previously, it's not that well defined. – Maarten Bodewes Jun 22 '16 at 22:35
  • @MaartenBodewes, thanks for the tip. This is my first post on SO and I'll make sure I post future questions to the right place. I answered your question at the end of my post. – Stan Ivanov Jun 23 '16 at 13:39
  • You're not making sense. Just jotting down numbers is not the same as guessing them. Here, I give you the values 5, 1 and 2 - they are possible outputs for a guessing game between 1 and 10. Now guess any number between 1 and 10.... – Maarten Bodewes Jun 23 '16 at 14:36
  • Related is the notion of cycles: http://security.stackexchange.com/questions/26043/how-big-is-the-risk-of-hash-fixed-points-cycles . Note that cycles are a problem of the inner *state* rather than the amount of output. Heck, you may take one *bit* of output as long as you don't enter a cycle within the state. I've already made a table with all possible outputs :P – Maarten Bodewes Jun 23 '16 at 14:39
  • Yes, of course! You are absolutely right! I confused the inner state cycle with the output of the CSPRNG. Unless I guess the inner state there is no way I can predict future output or learn past output. After I research the subject in-depth I’ll re-write my question to address the inner state cycle length of SecureRandom. – Stan Ivanov Jun 24 '16 at 12:37

0 Answers0