3

Lots of code does something like this to generate some kind of secret:

sha1(random())

Why not simply use a random number? I realize that in the past, some OS random generators weren't so great, but I'm not sure that's still true, and even if it were, how does sha1'ing it make it better?

Johannes Ernst
  • 3,072
  • 3
  • 42
  • 56

3 Answers3

3

I suspect that most often the hash function is misused as encoding function. One needs an alphanumeric code of a certain length, and the hash provides such a string. It's easy and it looks unpredictable.

It is easy to forget, that the random part of md5(random()) are not the 32 hex characters of the md5, instead there will be only as many possible results as the random() function delivers. If one knows how the random function works, one could precalculate a range of possible values.

The code would be more random (better entrophy), if you really used the random source of the OS, and request as many bytes as seems necessary. Those bytes you can then encode with a function like base64_encode(). For tokens, a base62 encoding would be ideal, because it is very compact and doesn't return special characters, though it's a bit hard to implement.

To answer your question: A random number would be better, but it has to be "really" random. If you use it as alphanumberic numbers with only '0'-'9', then your token will be long, it can be shortened with an encoding function.

martinstoeckli
  • 23,430
  • 6
  • 56
  • 87
2

If you're generating a secret, you should use a cryptographically secure random or pseudo-random number generator. Hashing a non-cryptograpically secure random number is a form of security by obscurity, not a real security measure.

It adds a step to figuring out the secret, which is probably why it's done, but it doesn't make the secret safe.

Xander
  • 479
  • 1
  • 13
  • 25
0

You wouldn't want to use a random number instead of a hash, because then it becomes easier for an attacker to guess your source and create his own "secret" (especially if you use the system clock, never do that. See Is using microtime() to generate password-reset tokens bad practice). A truly random number is not any better from a randomization standpoint than a hash of a random number but it obscures the pattern such that a human would not be able to recognize any pattern of changes (such as a seed value). Additionally, it does enhance security because a brute force attack takes much longer when the attacker must hash each attempt before submitting. This is especially true with a strong hash like SHA-256. The amount of computational time required for a brute force attack increases significantly when processor intensive operations are required for each submission. This is the reason that Linux went from storing password hashes as MD5 to SHA.

Additionally, SHAs and other hashes have some mathematical benefits that are good for security. Firstly, they always output the same length, regardless of the input length. Secondly, a small change in the input results in a large change in the output, and for a small change in the output, the corresponding input that would be required to obtain that output would differ much more than the previous input. This makes it really hard for an attacker to generate the hash they want by selectively feeding input, and it makes it virtually impossible for two similar inputs to result in the same or even very close outputs.

This is a deep topic that can have an entire textbook written about it, but these are just some high level reasons.

Community
  • 1
  • 1
Freedom_Ben
  • 11,247
  • 10
  • 69
  • 89
  • What do you mean by “vice versa”? – Gumbo Apr 26 '13 at 04:59
  • How is it that the hash of a random number would be more random than the random number itself? If there was some (installation-specific) salt, that'd be different, but in the code I usually come across, I haven't seen that. – Johannes Ernst Apr 26 '13 at 14:35
  • @Gumbo By vice versa, I mean that if you were to change the output by a small amount, the corresponding input that would be required to obtain that output would differ much more than the previous input. I'll update the answer with this. – Freedom_Ben Apr 26 '13 at 16:39
  • @JohannesErnst The hash isn't more random mathematically, but it obscures the pattern such that a human would not be able to recognize. Additionally, it does enhance security because a brute force attack takes much longer when the attacker must hash each attempt before submitting. This is especially true with a strong hash like SHA-256. I'll add this tot he answer. – Freedom_Ben Apr 26 '13 at 16:40
  • @JohannesErnst To clarify, neither my original post nor the updated version claimed that a hash of a random number is more random than the random number. Mathematically this is not true. – Freedom_Ben Apr 26 '13 at 16:49
  • 1
    @Freedom_Ben - If the token is generated from the random source of the OS, there should be no pattern at all. If it is generated from microtime as you mentioned, then you are right that it could help obscure the pattern. The time for brute-forcing will not increase though, because the result of the hashing is the token itself, that you send back to the server by clicking the link. Of course this token should not be stored plaintext in the database, it should be handled like a password and therefore be hashed with a slow key-derivation function like BCrypt (MD5 and SHA-* are all ways too fast). – martinstoeckli Apr 26 '13 at 20:49
  • @martinstoeckli there's no such thing as a random source of the OS. There are ways to get pretty random (such as white noise measurements), but there is still going to be some sort of pattern. The trick is to make it a pattern that repeats so infrequently as to be infeasible to decipher. This is where the hashing comes in. If the token isn't hashed, then generating samples to determine a pattern is easier. If they are hashed, then you have to hash your input value (which follows the pattern) before you can compare to see if the hash is correct. This greatly slows down the process of – Freedom_Ben Apr 26 '13 at 23:21
  • determining the pattern. If you are just brute forcing by cycling through every possible hash, without caring to know the input value for that hash, then yes it is no slower. But then once you have the right hash, in order to get the input value that would generate that hash you have to brute force again and this time hash the numbers until you have a match. If you just guess the hash but don't have the input value that created that hash, it doesn't help you figure out the pattern at all. So you have hacked one token, but you're no closer to being able to predict future tokens. – Freedom_Ben Apr 26 '13 at 23:24
  • Only a very unsophisticated attacker would just try throwing random hashes at a service without caring about the input. No matter which approach you take, it is going to make it slower for the attacker if you hash the value. If the attacker is harvesting tokens, then they have to wait for the server to do the hash. If they are creating their own, they have wait for their system to hash it, or be left with a meaningless hash. Some system somewhere has to perform the computation for the hash, and this slows down the process. I would also point out there are other reasons for using a hash. – Freedom_Ben Apr 26 '13 at 23:29
  • @Freedom_Ben - From your well thought answers i conclude, that you know exactly what the random source of the OS is. So i will only explain why it works. Our goal is to get an unpredictable token. The best we can do is to get something really random, but a computer cannot produce this. That's why every operating system feeds a random source with external events like mouse/keyboard events, disk access or requests from the internet. They are random, because they are generated by human beings. An attacker cannot control all of these events, so he cannot predict the content of the source. – martinstoeckli Apr 27 '13 at 09:48
  • @martinstoeckli It sounds to me like we are in agreement. The secure-random sources the OS uses are pretty darn good, good enough that we are not worried about using them provided we add enough computational overhead against a brute force attack from any non-super computer system. Whether that overhead comes from a computationally intensive calculation like a hash, or simply the fact there are so many possibilities that it would take a billion years to try them all, doesn't really matter too much. Thanks for the discussion! – Freedom_Ben Apr 27 '13 at 17:41