3

I am designing a file-server application where I want to check if a cached file on a client computer is the last version who is kept on the server.

I don't quite trust the 'changed date' attribute in the file system, so I want to compare the actual bytes in the file.

I think the fastest way to do this (as sending all the bytes across the web takes some time), is to send the file length and hash bytes to the server. Then the server checks the file length first, and if they match, it computes a hash for the file located on the server, and then checks if it is the same that the client computed.

Can anybody tell me what the how probable the hash collisions are when the file size is the same? (I am currently using MD5 for its speed).

Can I assume if the file size is the same and the hash is the same that the content is the same?

Thanks!

1 Answers1

3

Random collisions in MD5 are so improbable that its almost certainly safe to ignore the possibility.

However MD5 has been shown to be cryptographically weak so a malicious adversary could deliberately create files that collide. A famous example is:

On 30 December 2008, a group of researchers announced at the 25th Chaos Communication Congress how they had used MD5 collisions to create an intermediate certificate authority certificate which appeared to be legitimate when checked via its MD5 hash.

Source

Benni
  • 778
  • 8
  • 22
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    Thank you! The malicious aspect of the problem is not a concern :) Do you know an order of magnitude for the collision? Like one to what? – Jakob Høgenes Dec 09 '10 at 20:43
  • 1
    @Jakob: For two specific files to collide the probability is roughly 1 in 340282366920938463463374607431768211456. The chance of a collision in a set of files is more (but still extremely, incredibly, unbelievable, astonishingly unlikely). – Mark Byers Dec 09 '10 at 20:51
  • I am quite happy when the chances are that low! Again, thanks! – Jakob Høgenes Dec 09 '10 at 20:56
  • To be slightly more generic, for an N bit cryptographically secure hash, the chances of collision are pretty close to 1 in 2^N. – Slartibartfast Dec 11 '10 at 08:33