4

If we can't decode the MD5 hash string, then what is the purpose of MD5 where can we use MD5.

Siddiqui
  • 7,662
  • 17
  • 81
  • 129
  • I'm generally not a fan of link answers, but: http://en.wikipedia.org/wiki/Cryptographic_hash_function#Applications – Frank Farmer Apr 15 '10 at 09:00
  • 6
    If you add two numbers x and y, you get a result. It is not possible to figure out what x and y were just by looking at the result. And yet addition is useful. – Daniel Earwicker Apr 15 '10 at 09:03
  • then there may be the possibility that we are adding 5+2 or 4+3. In both the cases the answer will be same but x and y is different. – Siddiqui Apr 15 '10 at 09:05
  • Daniel's example was probably choosen to say that usefulness can't be determined by the fact the algorithm has no reverse function. A Hash algorithm by definition, should produce different results for different inputs. Of course, collisions exists, but they **must** be rare or the hash algorithm is unefficient. – ereOn Apr 15 '10 at 09:18
  • yes.. but still.. adding is useful. – Rosdi Kasim Apr 15 '10 at 09:20

6 Answers6

9

To store data save in a database for example.

If you save your password using md5 and you compare it with the password you enter in a form and hash it, it is still the same password but you can't see it in clear text in the database.

For example:

password = 123  
md5(123) === "202cb962ac59075b964b07152d234b70"

if you try to log in and you enter 123 as your password, the md5 of it will still be the same and you can compare those. But if your database is hacked the hacker cannot read the password in clear text, only the hashed value

Jeff Noel
  • 7,500
  • 4
  • 40
  • 66
Robert
  • 258
  • 1
  • 10
  • 8
    +1. Note, however, that when storing passwords, an additional random prefix should be added when computing the hash, and stored alongside the hash. This way, users choosing poor passwords won't be compromised by hash collisions. This is commonly known as a _salt_. E.g., if two users both choose "secret" as their password, the hash will be the same. A cracker with access to a set of passwords could look for these common hashes, and thus discover the password that produced it. The salt makes it extremely unlikely that the same password will produce the same hash for different users. – Marcelo Cantos Apr 15 '10 at 09:18
  • See also this other question: http://stackoverflow.com/questions/536584/non-random-salt-for-password-hashes/ – Georg Schölly Apr 15 '10 at 10:01
  • Keep in mind that passwords shouldn't be stored as a single MD5 hash (or any other single hash), since single hash operations are too fast to easily resist brute forcing: http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html – Josh Kelley Jun 01 '10 at 13:02
5

An decryptable file has the property that its always at least as big as the original file, a hash is much, much smaller.

This allows us the create a hash from a file that can prove the integrity of the file, without storing it.

There are many reasons not to store the file in encrypted or plain text:

  • As soon as an encrypted file falls in the wrong hands, they could try to decrypt it. There's no chance that's going to happen with a hash.

  • You simply don't need the file yourself, but maybe you're sending it to someone, and that person can proof it's integrity using the hash.

Georg Schölly
  • 124,188
  • 49
  • 220
  • 267
  • +1. Are you sure about the *at least as big* statement ? Can't we imagine a cipher algorithm that use something like a "compression method" ? (I'm not talking of ciphering then compressing, but of an hypothetical algorithm that could produce smaller ciphered data). Just wondering. – ereOn Apr 15 '10 at 09:12
  • 2
    In order to obtain meaningful compression, you'd have to compress the data first, then encrypt it. (Why? Because well-encrypted data is essentially random and compression relies on patterns in the data, so it doesn't work on random data.) In that case, the data being encrypted is the compressed form of the original data and Georg's statement that the encrypted data must be at least as large as the data being encrypted (which, remember, is now the compressed form) still holds. – Dave Sherohman Apr 15 '10 at 09:15
  • @ereOn: That would be a compression algorithm. But imagine you've got already compressed input data, in that case you couldn't compress it even more. (If that were possible, you could compress data indefinitely.) – Georg Schölly Apr 15 '10 at 09:18
  • Yep, it make sense. Thanks to both of you ;) – ereOn Apr 15 '10 at 09:33
  • 1
    There is in fact no such thing as a reversible compression function - *on average*, the output of every reversible function is also at least as big as the original file. A "compression algorithm" is actually an "expansion algorithm" with some interesting failure cases :) – caf Apr 15 '10 at 12:01
  • @caf: This definition recently popped up on comp.lang.c for some reason. Was it original with you? – Keith Thompson Jun 21 '16 at 20:51
  • @KeithThompson: No, I was paraphrasing something I'd heard many years before. I'm afraid I don't have much recollection of the original source - it may have been on comp.compression? – caf Jun 22 '16 at 11:56
2
  • It allows you to determine whether the data you have (e.g., an entered password) is the same as some other data which is secret (e.g., the correct password) without requiring access to the secret data. In other words, it can be used to determine "is this user-entered password correct?" while also keeping the correct password secret. (Note that there are stronger hashing methods out there which should be used instead of md5 for this purpose these days, such as sha* and bcrypt. With modern hardware, it's fairly easy to throw millions of passwords per second at an md5 hash until you find one that matches the correct password.)

  • It allows you to verify the integrity of a transmitted file by comparing the md5 hash of the original file with the md5 hash of the data that was received. If the hashes are different, the received data was not the same as the sent data, so you know to re-send it; if they're the same, you can be reasonably certain that the sent and received data are identical.

Dave Sherohman
  • 45,363
  • 14
  • 64
  • 102
2

MD5 is mainly used to maintain the integrity of files when it is send from 1 machine to another machine,to detect whether any man in middle third party have not modify the contents of files.

Basic example is : When you download any file from server server has MD5 calculated when it comes to you it again check for md5 values if md5 hash matches file is not corrupted or not modified by any third person.

1

Good hash functions like MD5 can be used for identification. See this question. Under certain conditions you can assume that equal hashes mean equal data blocks.

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 6
    *Good hash functions like MD5* seems a little bit outdated to me. – Georg Schölly Apr 15 '10 at 09:20
  • @Georg: MD5 has been hacked so it should not be used for scenarios when someone would want to subvert it. Otherwise it is still good. – sharptooth Apr 15 '10 at 09:30
  • 1
    Not really. DSPs (like, say, your video card's GPUs) can brute-force md5 in no time flat because it's too simple to calculate. The code at http://bvernoux.free.fr/md5/index.php, for example, claims to process 200 million md5 hashes per second on common consumer hardware (GeForce 8800GT and Core2Duo E6750 using one core). It's so easy to brute-force that there's no real point in making algorithmic attacks against it any more. – Dave Sherohman Apr 15 '10 at 09:39
  • @Dave Sherohman: Didn't know of those. Still this doesn't prevent from using MD5 in scenarios when there're noone to subvert it. – sharptooth Apr 15 '10 at 09:42
  • @sharptooth: If there's no one to subvert it, you are not paranoid enough ;) – Piskvor left the building Aug 18 '10 at 12:54
  • @Piskvor: Seriously I didn't mean cases when data protected is so useless noone wants to hack into it. I meant scenarios where there is no attacker - like this one http://stackoverflow.com/questions/862346/how-do-i-assess-the-hash-collision-probability – sharptooth Aug 18 '10 at 12:59
  • @sharptooth: I see. I'd personally go with SHA-256 - not too much slower; but I guess MD5 would be good enough there. – Piskvor left the building Aug 18 '10 at 13:23
0

MD5 is a hash function and there are more like that such as SHA, PBKDF, bcrypt and scrypt. I really prefer scrypt. Hash functions are used for integrity reasons in order to detect any manipulations that may occurred during the transmission of the actual message. The receiver is able to find if the received message has not not changed by checking the hash value of the message.

These functions have three security properties: 1) It is difficult for someone to detect the actual message when it only has the h(m). 2) Given a message m and its hash function it is difficult to find another message with the same hash value. 3) Last, it is difficult to find to different messages m1, m2 with the same hash value.

Also, it is important to know that hash function's algorithms are public and it is very easy to compute the hash value of a message. Moreover, hashes are "one-way" functions, meaning that is hard to find the message given the hash of the message. The actual security thus, is based on that property.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
thrylos
  • 1,513
  • 1
  • 10
  • 5