8

I was wondering whether md5, sha1 and anothers return unique values.

For example, sha1() for test returns a94a8fe5ccb19ba61c4c0873d391e987982fbbd3, which is 40 characters long. So, sha1 for strings larger than 40 chars must be the same (of course it's scrambled, because the given input may contain whitespaces and special chars etc.).

Due to this, when we are storing users' passwords, they can enter either their original password or some super-long one, which nobody knows.

Is this right, or do these hash algorithms provide really unique results - I'm quite sure it's hardly possible.

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
Mikulas Dite
  • 7,790
  • 9
  • 59
  • 99
  • 2
    It's not unique, it's just "unique enough" – Rex M Jun 17 '10 at 09:14
  • 1
    @Rex M: Keep in mind, it doesn't *want* to be unique, because if it was it would imply a direct mapping, which would mean it can be reversed. – Noon Silk Jun 17 '10 at 09:19
  • 1
    Nobody here really mentioned, but sha1 is the *wrong* algorithm to use for password hashing, bcrypt is significantly safer. be sure to read http://stackoverflow.com/questions/800685/which-hash-function-should-i-choose/817121#817121 – Sam Saffron Jun 17 '10 at 10:17
  • Sam yes you're right (SHA1 is dead), but your comments in that link are in incredibly bad form, IMHO: I would advise people review: http://valerieaurora.org/hash.html and use the algorithms approved by NIST (that is SHA2 Family) for cryptographic purposes. – Noon Silk Jun 17 '10 at 10:21
  • @silky, if you are doing a first pass search for duplicate files MD5 is plenty good, if you are hashing passwords SHA2 family is bad cause it is fast. bcrypt is slow which makes it a lot less vulnerable to brute-force attacks. – Sam Saffron Jun 17 '10 at 10:44
  • @silky, also if you read through explicitly I am pretty much saying, if you need a "secure" hash function don't use MD5. Sometimes you need to hash stuff and don't need security. – Sam Saffron Jun 17 '10 at 10:48
  • @Sam: It goes without saying that a broken cryptographic hash is still good for non-cryptographic purposes. The second part of your sentence doesn't make sense. – Noon Silk Jun 17 '10 at 10:48
  • @silky my recommendation is "Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks." or in english, Use MD5 if you do not need a cryptographically secure hash, I'm not following what is so controversial about my recommendation. – Sam Saffron Jun 17 '10 at 10:52
  • @Sam: This isn't the place to have this discussion. It's too complex, but my position is: Don't use a broken hash for anything. Even file comparision (this is the classic attack on MD5 anyway, so you'd need to think through the results a clash would have in your given system. Also, if you review my link, you will find your opinion in the column "non-expert". I don't mean that as an insult, but I don't think it's a wise strategy. My initial comment was in regards to your main post, which I don't have time to comment on in a worthwhile manner at the moment. Hope I've clarified a little. – Noon Silk Jun 17 '10 at 11:05
  • @silky ... and you are implying that stuff is A-OK if you use SHA256 + salt for your password storage scheme which is what the user is doing and is wrong http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html – Sam Saffron Jun 17 '10 at 11:58
  • @Sam: Please start a thread on any of the relevant security mailing lists if you wish to continue this discussion (I'll see it). It's off-topic to this thread. – Noon Silk Jun 17 '10 at 12:05
  • @silky to quote the author of this post "when we are storing users' passwords" its totally on topic. @Mikulas is not going to read your security mailing lists – Sam Saffron Jun 17 '10 at 12:50
  • @Sam: I've tried to be polite and patient with you but this is ridiculous. I don't know what it will take for you to understand that you will get no agreement from me on any of your points within this thread. I will not respond further; feel free to contact me offline or via the ways I've already explained it you are genuintely interested in the discussion. I suspect you aren't. You've also not totally understood (it seems) Matasano's position. – Noon Silk Jun 17 '10 at 13:05
  • @Sam: There are faster alternatives to MD5 if you don't need a "cryptographically secure hash" (ie. you just need a checksum), such as CRC32 – BlueRaja - Danny Pflughoeft Jun 17 '10 at 15:47

5 Answers5

13

(Note: You're asking about hashing functions, not encryption).

It's impossible for them to be unique, by definition. They take a large input and reduce its size. It obviously follows, then, that they can't represent all the information they have compressed. So no, they don't provide "truly unique" results.

What they do provide, however, is "collision resistant" results. I.e. they try and show that two slightly different datas produce a significantly different hash.

Noon Silk
  • 54,084
  • 6
  • 88
  • 105
10

Hashing algorithms (which is what you are referring to) do not provide unique results. What you are referring to is called the Pigeonhole Principle. The number of inputs exceeds the number of outputs, so multiple inputs must be mapped to the same output. This is why the longer the output hash the better, because there are less number of inputs mapped to an output.

Encrypting something must provide a unique results, because you can encrypt a message and decrypt it and get the same message.

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
4

SHA1 is not encryption algorithm, but a cryptographic hash function.

You are right - since it maps arbitrary long input to a fixed size hash there can be collisions. But the idea of a cryptographic hash function is to make it impossible to create such collisions "on demand". That's why we call them one-way hash functions, too.

Quote (source):

The ideal cryptographic hash function has four main or significant properties:
* it is easy to compute the hash value for any given message,
* it is infeasible to find a message that has a given hash,
* it is infeasible to modify a message without changing its hash,
* it is infeasible to find two different messages with the same hash.

Kzqai
  • 22,588
  • 25
  • 105
  • 137
tanascius
  • 53,078
  • 22
  • 114
  • 136
1

Hashing algorithms never guarantee a different result for a different input. That's why hashing is always used as a one-way "encryption".

But you have to be realistic, a 160-bit hashing algorithm can have 2^160 possible combinations, which is... a lot! (1 with 48 zeroes)

Philippe Leybaert
  • 168,566
  • 31
  • 210
  • 223
1

These are not encryption functions, but hashing ones.

Hashing, by definition, can have two different strings collide (map to the same value) for the very reasons you mention. But that is usually not relevant because:

  1. Cryptographic hashes (such as SHA1) try hard to make the collision probability for similar strings (very, very) low
  2. You cannot deduce the original string from the hash.

These two mean that you cannot take a hash and easily generate one of the strings that map to it.

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373