1

Further to my question here, I'll be using the random_compat polyfill (which uses /dev/urandom) to generate random numbers in the 1 to 10,000,000 range.

I do realise, that all things being correct with how I code my project, the above tools should produce good (as in random/secure etc) data. However, I'd like to add extra sources of randomness into the mix - just in case 6 months down the line I read there is patch available for my specific OS version to fix a major bug in /dev/urandom (or any other issue).

So, I was thinking I can get numbers from random.org and fourmilab.ch/hotbits

An alternative source would be some logs from a web site I operate - timed to the microsecond, if I ignore the date/time part and just take the microseconds - this has in effect been generated by when humans decide to click on a link. I know this may be classed as haphazard rather than random, but would it be good for my use?

Edit re timestamp logs - will use PHP microtime() which will creaet a log like:

0.**832742**00 1438282477
0.**57241**000 1438282483
0.**437752**00 1438282538
0.**622097**00 1438282572

I will just use the bolded portion.

So let's say I take two sources of extra random numbers, A and B, and the output of /dev/urandom, call that U and set ranges as follows:

A and B are 1 - 500,000

U is 1 - 9,000,000

Final random number is A+B+U

I will be needing several million final numbers between 1 and 10,000,000

But the pool of A and B numbers will only contain a few thousand, but I think by using prime number amounts I can stretch that into millions of A&B combinations like so

// this pool will be integers from two sources and contain a larger prime number 
// of members instead of the 7 & 11 here - this sequence repeats at 77
$numbers = array("One","Two","Three","Four","Five","Six","Seven");
$colors = array("Silver","Gray","Black","Red","Maroon","Yellow","Olive","Lime","Green","Aqua","Orange");


$ni=0;
$ci=0;

for ($i=0;$i<$num_numbers_required;$i++) 
    {
    $offset =   $numbers[$ni] + $colors[$ci];

    if ($ni==6) // reset at prime num 7
        $ni=0;
    else
        $ni++;

    if ($ci==10) //  reset at  prime num 11
        $ci=0;
    else
        $ci++;

    }

Does this plan make sense - is there any possibility I can actually make my end result less secure by doing all this? And what of my idea to use timestamp data?

Thanks in advance.

Community
  • 1
  • 1
Paul
  • 51
  • 1
  • 5

1 Answers1

4

I would suggest reading RFC4086, section 5. Basically it talks about how to "mix" different entropy sources without compromising security or bias.

In short, you need a "mixing function". You can do this with xor, where you simply set the result to the xor of the inputs: result = A xor B.

The problem with xor is that if the numbers are correlated in any way, it can introduce strong bias into the result. For example, if bits 1-4 of A and B are the current timestamp, then the result's first 4 bits will always be 0.

Instead, you can use a stronger mixing function based on a cryptographic hash function. So instead of A xor B you can do HMAC-SHA256(A, B). This is slower, but also prevents any correlation from biasing the result.

This is the strategy that I used in RandomLib. I did this because not every system has every method of generation. So I pull as many methods as I can, and mix them strongly. That way the result is never weaker than the strongest method.

HOWEVER, I would ask why. If /dev/urandom is available, you're not going to get better than it. The reason is simple, even if you call random.org for more entropy, your call is encrypted using random keys generated from /dev/urandom. Meaning if an attacker can compromise /dev/urandom, your server is toast and you will be spinning your wheels trying to make it better.

Instead, simply use /dev/urandom and keep your OS updated...

Community
  • 1
  • 1
ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • Thanks for your reply. If I understand correctly, say A is an integer value (not binary) of 30234 and B is 4354, $result = $a ^ $b which is 26392 - di I understand correctly? Why is XOR recommended over a simple addition (30234 +4354 = 34588) ? Re your concerns over the timestamp, I've updated my question with an example value, and also only A or B will come from a timestamp, not both - does this address that? – Paul Aug 05 '15 at 16:49
  • Re why I want to do this, and to play Devil's advocate, in my application it would be a problem if there was a bug in `/dev/urandom` which was discovered after I generate the numbers as they will be in use for some time. – Paul Aug 05 '15 at 16:49
  • Even if your application is hardened against a hypothetical `/dev/urandom` weakness, the rest of your system sure isn't. If you're concerned, I'd focus on seeing how `urandom` works and looking for bugs / suggesting improvements to the kernel. – Scott Arciszewski Aug 06 '15 at 13:51
  • @Paul no, the mixing is done at the byte level, not the integer level. And you *can't* harden against a bug in /dev/urandom. The OS uses it for too much. Meaning that while your random numbers may be "secure", the OS isn't and hence you can't know if someone isn't looking at those "secure" numbers. Hence it's safe to just trust /dev/urandom – ircmaxell Aug 06 '15 at 21:19