5

Is md5 hashing algorithm an injective function? meaning that it will generate a unique output for any given input?

If not, is there some other similar hashing algorithm that is injective?

skaffman
  • 398,947
  • 96
  • 818
  • 769
Yasser1984
  • 2,401
  • 4
  • 32
  • 55
  • 3
    You can easily prove that any hash function capable of working on inputs larger than its output cannot be injective. – thiton Jan 11 '12 at 16:17
  • I'm not sure if this argument makes sense or not, is there anything else you can demonstrate to elaborate? – Yasser1984 Jan 14 '12 at 20:58
  • [MD5 has a minimum block length of 512 bits and an output of 128 bits](http://en.wikipedia.org/wiki/MD5#Algorithm). It cannot be injective because there are 2^384 more input numbers than output numbers. You can talk about a restricted MD5, but then you'd have to define your padding approach. – thiton Jan 14 '12 at 21:07
  • thiton, following the same line of reasoning, wouldn't it be correct to say there can be no injective algorithm? – Yasser1984 Jan 14 '12 at 21:21
  • 1
    Of course there can be an injective algorithm. An algorithm implements a function, and there are injective functions. The identity function (`int foo(int i) { return i; }`) is injective. But a necessary condition for any injective function is that it has at least as many allowed output numbers than input numbers. – thiton Jan 14 '12 at 21:33
  • But if it's the length of the output is the same as the input, or even more, then it's not really hashing it is it? Wikipedia definition: "A hash function is any algorithm or subroutine that maps large data sets, called keys, to SMALLER data sets" – Yasser1984 Jan 14 '12 at 21:36
  • This "large" refers to the number of bits used to represent the input, which is for some applications not equal to the entropy in the input. Then, we still speak of a hash function even if its injective. (Scroll down on the wikipedia page to see the trivial and perfect hash functions.) Example: Suppose you have only two inputs, "aaaaaaaaaaa" and "bbbbbbbbbbb". A function that maps the first sequence to 0 and the second to 1 is injective on these inputs and can be used as a hash function. – thiton Jan 14 '12 at 21:43
  • You should accept one of these answers. – Alexander May 09 '16 at 17:41

5 Answers5

6

No, MD5 has collision vunerabilities. Other hash functions such as SHA-1 also have hash collisions, although it is much less likely than MD5.

An injective hashing function is also known as a perfect hash function. Perfect hash functions do exist, but there are certain requirements or information you will need to know about the input data before you can know that your hash is perfect.

You could look at CMPH for information on creating a perfect hash function.

cmbuckley
  • 40,217
  • 9
  • 77
  • 91
  • If a perfect hash function can exist, then can we say this comment "You can easily prove that any hash function capable of working on inputs larger than its output cannot be injective." is wrong? – Yasser1984 Jan 14 '12 at 20:55
  • The comment is wrong as a perfect hash works on a given set. For instance, if the set is {10, 20, 30, ...} and the hash function is `function hash(int i) { return i / 10; }`. I think the comment actually means the _range_ of inputs. – cmbuckley Jan 15 '12 at 16:36
0

md5 is not an injective function, because output is smaller than input, so you have more input possibilities than output.

I think sha-1 is not injective.

JuSchz
  • 1,200
  • 2
  • 15
  • 30
-1

Every hash function is NOT injective. Hashes map a large domain to a significantly smaller codomain. By the pigeon-hole principle, such a function can't be injective, because there would be items in the domain that'll map to the same item in the codomain.

For example, giving a hash function a large file as input, and receiving a short checksum. There are many more possible large files (pigeons) than there are possible short checksums (pigeon-holes), so "collisions" are surely going to occur.

Alexander
  • 59,041
  • 12
  • 98
  • 151
-2

Here is another post that may be able to answer your question.

Technically no, but kinda yes, because there is a very slim chance for them to be the same.

Here is another post discussing this issue.

Community
  • 1
  • 1
prolink007
  • 33,872
  • 24
  • 117
  • 185
-3

Practically, yes.

Realistically, it has been shown that it does have the potential for collisions. I would use SHA-1 instead. 1

KingCronus
  • 4,509
  • 1
  • 24
  • 49