Why is one function for an MD5 hash calculation preferable for smaller files yet inefficient for large files?

Question

I am working on generating hash values for files as a means of disallowing duplicate files in a small database. As I was researching, I found the following thread: How to generate an MD5 checksum for a file in Android?

Why is the first answer "not efficient" for large files and it is best for small strings, whereas the answer provided by dentex is better-suited for large files? Is it because of the way the solution was programmed, or is there a caveat with MD5 hashing that I am unaware of?

Very simply, the first answer assumes all the data is in memory to start with whereas as the second answer streams smallish chunks of data in from the file. The "efficiency" in question is memory efficiency The first answer is arguably incorrect, in that it assumes pointlessly that the contents of the file can be held in String. — President James K. Polk, May 10 '20 at 19:03

score 0 · Answer 1 · answered May 10 '20 at 16:53

0

MD5 generates a 128-bit digest.
SHA-1 generates a 160-bit digest.
SHA-2 generates a 224-, 256-, 384- or 512-bit digest.

More bits means more distinct values, means less likelihood of a two distinct inputs generating the same digest.

answered May 10 '20 at 16:53

Andreas

154,647
11
152
247

Why is one function for an MD5 hash calculation preferable for smaller files yet inefficient for large files?

1 Answers1