5

A question I've been meaning to have answered for a long time - What would be the time complexity of finding an MD5sum of a compiled binary that contains that same MD5 statically embedded in it, say, as a string?

Edit: If this wasn't already clear. I am looking for an answer with the time complexity and an explanation of it.

rook
  • 66,304
  • 38
  • 162
  • 239
atx
  • 4,831
  • 3
  • 26
  • 40

6 Answers6

1

This can be solved in constant time.

The file that contains all possible MD5 digests contains the digest of that file.

Jeremy Powell
  • 3,426
  • 2
  • 21
  • 29
  • Although, that's assuming MD5 is well defined for messages larger than 2^64, which is size of the length padding. – Jeremy Powell Apr 02 '11 at 09:21
  • I think this is an interesting point to note, but let's say your file can't include all the possible MD5 hashes? – atx Apr 02 '11 at 09:44
  • How would you generate this file in constant time? – Nick Johnson Apr 04 '11 at 23:45
  • @Nick Johnson: Well, you're right, in a way. This discussion really can't be entirely accurate without defining what the complexity is compared against. In many cases, we're interested in describing the complexity in terms of the size of the input. However, since we're actually *constructing* the input, I'm not exactly sure what it means to be 'exponential time' or 'constant time' in terms of the input. The way I was thinking about it is: given any hash algorithm *as input*, construct a file with the above property. In that case, it is constant with respect to the input size. – Jeremy Powell Apr 05 '11 at 01:29
  • ...but of course you're right. It will take you a *very long time* to write down 2^128 bits, which is the reason @malfy decided to disregard my answer :) And of course, I'm in total agreement with that decision. – Jeremy Powell Apr 05 '11 at 01:32
0

Pretty much the same as brute force: computing a md5sum is trivial, but making a file to match a known md5sum is hard.

You are, in the best case, looking for a preimage that will give you a known hash (possibly by adding data into the binary).

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
0

It is really hard since MD5 has a good distribution. Your best bet by bruteforcing is adding some hash to your file and appending meaningless data to the end of the binary until the binary has the same hash as the one embedded.

On the other hand if you want to check whether your binary is intact and unmodified you'd be better off by splitting your binary in 3 parts: The part of the binary before the hash, the hash itself and the part after the hash. Concatenate the first and the last part, compute the md5 hash and embed it into the binary.

Like this (example):

foo098f6bcd4621d373cade4e832627b4f6bar
foo | 098f6bcd4621d373cade4e832627b4f6 | bar
md5(foo+bar) = 3858f62230ac3c915f300c664312c63f
foo3858f62230ac3c915f300c664312c63fbar
orlp
  • 112,504
  • 36
  • 218
  • 315
0

What you want to do would require a flaw in the hash algorithm. And in fact, MD5 isn't very secure, so it might be possible. But I don't see how any of the known weaknesses could help in what you want to do. Even a preimage attack wouldn't help, it would need to be a 'chosen prefix preimage attack'.

So I guess the only feasible way (today) would be to use brute force.

If you are looking a way to 'add security' to a binary, truncate the hash, and then use brute force.

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132
0

That also depends on the circumstands. If the binary is given, and just the md5 must be inserted at some place, the answer can be: 0. Because it may be possible (and I would assume it maybe in almost all cases like this) that there is no md5 which inserted yields the same md5.

In case you have the possibility to modify more of the binary (e.g. by having padding space which you can fill with arbitrary numbers), the only feasible approach is brute force.

flolo
  • 15,148
  • 4
  • 32
  • 57
0

In order to accomplish this you must find a collision. This can be done using the prefixing attack against md5. Keep in mind this is only possible because md5 is very broken. This attack has a time complexity of 2^24.1, which is within reach of a modern desktop.

rook
  • 66,304
  • 38
  • 162
  • 239
  • You would need to find a collision, but not a typical collision (identical MD5 hashes), but a collision of hash code and (chosen) part of the message. That's much harder. – Thomas Mueller Apr 02 '11 at 19:08