1

I want to hash user uploaded files

then save the hash to db

this help me prevent duplicate upload files

I was wondering which hash algorithm should I use

googled someone say crc32b is better and faster then md5

will not have dynamic errors?

if not use hash_file('crc32b'), or md5_file vs sha1_file?

jk jk
  • 1,027
  • 2
  • 13
  • 28
  • Sorry not sure the purpose of this question, you have been given an answer by your own searching so now you are just asking the same thing here? – Anigel Jul 05 '13 at 06:53
  • i don't know crc32b or md5 which one is better for prevent duplicate files,is the answer is crc32b? – jk jk Jul 05 '13 at 06:55
  • possible duplicate of [When is CRC more appropriate to use than MD5/SHA1?](http://stackoverflow.com/questions/996843/when-is-crc-more-appropriate-to-use-than-md5-sha1) – Robert Jul 05 '13 at 06:59
  • SO why do you doubt what you have already found? – Anigel Jul 05 '13 at 07:01
  • 2
    How do you define better? Is it in terms of speed or probability of collitions? As MD5 gives a 128 bit hash while crc32b gives a 32-bit hash, the risk of getting false positives is far less using MD5. However, using crc32b is faster. – Terje D. Jul 05 '13 at 07:03
  • I want the hash is exactly,without changing,i searched someone say md5 may return a wrong hash,so is crc32b better for compare hash? – jk jk Jul 05 '13 at 07:06
  • In your case, to make fingerprints of uploaded files, i would go for SHA-1. Huge systems like GIT are relying on the uniqueness of SHA-1 hashes, and i never heard of a wrong versioned file because of a hash-collision. That MD5 generates a wrong hash is surely a joke, MD5 is an algorithm and cannot produce "wrong" hashes. – martinstoeckli Jul 05 '13 at 11:24
  • but according this link,http://stackoverflow.com/questions/2293902/is-sha-sufficient-for-checking-file-duplication-sha1-file-in-php viraptor reply sha1 is weaknesses – jk jk Jul 05 '13 at 11:43
  • @jkjk - SHA-1 is more collision resistant than MD5, that's what you need in your case. The mentioned weakness would only be a problem if an attacker tries to make slight changes to a file, and could then store the file with the same hash-value. While it may be possible in future to find a file with the same hash-value, it is nearly impossible to define the changes of such a collision-file, it would contain only binary rubbish data. – martinstoeckli Jul 05 '13 at 12:49

1 Answers1

1

A CRC-32 is much faster, and can be used to rule out a match in most cases. If you get a hit with a CRC, then you can apply a larger signature to check that it really is a match. Depending on the volume of your traffic, it is quite possible that you will get false-positive matches with just a CRC. Use a SHA-256 to check to see if it really is a match, and only reject on the basis of that.

There is still an extremely small probability that you will get a false-positive with a SHA-256 as well. For your application however, you might just as well accept preventing a user from uploading a genuinely new file in that very rare case.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158