121

Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • 9
    Which md5 transformation? The mathematical one (from any bitstring to 128 bits) or the one from any bytestring to a 32-char hexstring (the practical one)? It isn't obvious that the answers for both of them are the same... – Rafał Dowgird May 06 '09 at 15:27
  • 5
    Well, they _are_ the same answer, right? We know there exists no non-128-bit-long x for which `md5(x) == x`, because `md5(x)` _is_ 128 bits long. Therefore, there is a fixed point in md5 for arbitrarily-sized input _if and only if_ there is a fixed point in md5 on the 128-bit domain. – the paul Apr 26 '15 at 18:09
  • 1
    I don't think they are the same answer since for the practical 32 characters hexstring it is an arbitrary choice whether you represent the hex digits in upper case [A-F] or in lower case [a-f]. Both representations correspond to the same 128-bit number but they will yield different hashes when provided as inputs to MD5. So the probability that there is a fixed point in **either** of the representations is in fact `1-(1/e)*(1/e) ≈ 86.47%` – Dušan Apr 19 '17 at 08:39
  • 1
    Let’s search for it ;) - https://github.com/zvibazak/Nice-MD5s – zvi Dec 31 '20 at 21:14

7 Answers7

143

Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2128.

Thus, the probability that no 128-bit string is a fixed point is (1 − 1/2128)2128, so the probability that there is a fixed point is 1 − (1 − 1/2128)2128.

Since the limit as n goes to infinity of (1 − 1/n)n is 1/e, and 2128 is most certainly a very large number, this probability is almost exactly 1 − 1/e ≈ 63.21%.

Of course, there is no randomness actually involved – either there is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 11
    Can you actually make the assumption that the MD5 sum of any string is uniformly distributed over all possible sums? – Ori Pessach May 06 '09 at 15:22
  • 13
    Yes. Large numbers and modulous form a roughly random distribution. If they didn't, you would have constant collisions. The nature of md5 forces its output to be distributed randomly. – Stefan Kendall May 06 '09 at 15:37
  • 2
    I used your answer as a base for this answer: http://security.stackexchange.com/questions/3851/can-a-file-contain-its-md5sum-inside-it/5609#5609 – CesarB Jul 25 '11 at 02:03
  • 1
    Here, have a gold badge. – Dennis Dec 03 '15 at 22:08
  • Except that md5 is deterministic, not random. – PyRulez Jan 10 '16 at 03:02
16

My brute force attempt found a 12 prefix and 12 suffix match.

prefix 12: 54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762

suffix 12: df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78

Blog post: https://plus.google.com/103541237243849171137/posts/SRxXrTMdrFN

Thomas Egense
  • 169
  • 1
  • 3
  • 1
    The link is not working. Google plus shut down in April – Typewar Aug 01 '19 at 14:58
  • Sorry... I have not saved the blog post and google+ backup is not working for me. But here is my github project: https://github.com/thomasegense/MD5FixPointSearch – Thomas Egense Oct 08 '19 at 07:49
  • Are you sure about this : prefix 12: 54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762 I used `md5sum` linux command, I got different result – ThunderPhoenix Apr 25 '20 at 14:44
  • Not sure you using the md5sum correct then. You can also confirm it online here: http://onlinemd5.com/ – Thomas Egense Apr 26 '20 at 17:03
  • 3
    @ThunderPhoenix Make sure the input string doesn't have a trailing newline: `echo -n "54db1011d76dc70a0a9df3ff3e0b390f" | md5sum` – Laszlo Treszkai Nov 26 '20 at 17:03
11

Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.

To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Kibbee
  • 65,369
  • 27
  • 142
  • 182
  • 2
    True, if you had to try *every one*. But you would only have to try every possible input to verify that there was no fixed point. If there is a fixed point (and Adam Rosenfield's answer suggests that there might be) then one lucky guess is all that's needed. – Naaff May 06 '09 at 15:35
  • The function is irreversible in the sense that it has no mathematical inverse, however this only means that for a given output there may be more than one input. In general, the space of inputs for a given output would be infinite, but if you know it started as a 128-bit value you can narrow down the possibilities. There is a chance for "working backwards" if you don't treat the function as a black box, but instead read the spec and apply some mathematical thinking. – rndmcnlly May 23 '09 at 01:46
  • 2
    @Naaff: "only have to try every possible input" - and this is easier than to try every hash, how? Quite the opposite, since several possible inputs may hash into the same output. – Piskvor left the building Jun 03 '09 at 19:39
  • 1
    @Piskvor: You misunderstood what Naaff meant (it took me a minute, too). A clearer way to say it would be "Only if there is no fixed point will you have try try every possible input (from the space 2^128)". In other words, you only have to try every possibility if none before that works. So 1.08e28 years, or one lucky guess! – P Daddy Jun 18 '09 at 17:07
  • 1
    "If it took 1 millisecond to compute a hash". Modern GPUs can calculate billions of hashes a second, much faster than this. But still, it would take a very long time. – markasoftware Mar 07 '17 at 17:44
1

While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).

My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!

rndmcnlly
  • 1,588
  • 1
  • 9
  • 18
-1

Probably, but finding it would take longer than we have or would involve compromising MD5.

Andru Luvisi
  • 24,367
  • 6
  • 53
  • 66
  • 7
    It hasn't been broken. All they've been able to do is, in a reasonable amount of time produce 2 strings that equate the the same hash. It's still very hard to produce a string which will equate to a specific hash. – Kibbee Oct 25 '08 at 02:30
  • 9
    not sure how finding one would compromise md5, anymore than it would compromise the algorithm if I told you MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6 – Kip Oct 25 '08 at 03:00
  • 5
    A fixed point would probably give some leverage on the maths that could lead to a more comprehensive breach of MD5. I'm not convinced that Glomek can really justify 'probably'; I would accept 'possibly' without equivocation. – Jonathan Leffler Oct 25 '08 at 04:09
-8

There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.

  • Interpretation 1: does the MD5 of a MD5 output in binary match its input?
  • Interpretation 2: does the MD5 of a MD5 output in hex match its input?
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 13
    There's nothing about the MD5 algorithm that implies hex - it operates on bytes, and produces bytes - so I think the latter interpretation is invalid. – Nick Johnson May 07 '09 at 10:54
  • Whether or not there is a fixed point under interpretation 1, there could still be (or not be) one under interpretation 2. However, if you are interested in exploring the problem, interpretation 1 seems like a much better place to start because you won't have to make all sorts of arbitrary decisions about casing and character encoding. On top of that, the binary case has fewer bits! – rndmcnlly May 23 '09 at 01:44
  • 5
    You are misinterpreting what the hex really is. You can represent binary in hex, just as you can represent it in decimal or octal or base 3. It's a number, and has different representations. So, interpretation 1 and 2 are the same thing. What you're thinking of is the character string representation, which is not the same hex at all but is an entirely different binary value. In fact you could have many different hex strings in different character sets. The 128-bit hash value can be represented as a "hex" string, but it is not equal to the string. The string is not the same binary data. – defines Jun 15 '09 at 18:04
  • Dustin, interpretation 2 really does mean the MD5 of the display string. – Joshua Jun 15 '09 at 21:09
  • 5
    There is a huge problem with that idea though, in that it is directly dependent on your character encoding. Different encoding schema are going to result in entirely different result sets. There's even an entire project and an article debunking it based on this misunderstanding of how MD5 operates http://acodingfool.typepad.com/blog/2009/05/the-kembler-identity.html – defines Jun 18 '09 at 16:08
-23

Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.

Ori Pessach
  • 6,777
  • 6
  • 36
  • 51
  • 4
    No, the MD5 of 1 byte strings exists. – Joshua May 06 '09 at 15:30
  • 8
    The input can be any size. If the input is less than 512 bytes then it is padded, but small inputs are still acceptable. From Wikipedia: "MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian integers); the message is padded so that its length is divisible by 512." – Naaff May 06 '09 at 15:38
  • So you're assuming that, say, 0000000001 = 1 ? I would argue then that the question is poorly specified, at best. – Ori Pessach May 06 '09 at 15:42
  • 11
    The *input* to MD5 can be 128 bits. If MD5 wants to pad that input then, well, frankly, that's MD5's business. The input is still well defined. Likewise, the output is a well-defined 128 bits. If the (well-defined) input and the (well-defined) output are both the same then MD5(x)=x. – Naaff May 06 '09 at 15:50
  • But the padding is not well defined in the context of the transformation itself. Who says you can't pad with all 1s, or a randomly chosen, repeating bit pattern? Well, the RFC says that, but only in the context of extending a function from 512 bit numbers to 128 bit numbers to produce 128 bit numbers for arbitrarily long (or short) inputs. It only serves to further illustrate how poorly specified the question is. – Ori Pessach May 06 '09 at 16:19
  • 2
    @Joshua the MD5 of an empty string (i.e. 0 bytes) even exists – Kip Jul 27 '09 at 04:13