0

I have sometimes seen and have been recommended to store Strings and associative array keys as MD5 hash values. Now I have learnt about hashing from MIT - OCW 6.046j and it seems more like a scheme to store data in an efficient format for fast searching and to prevent people from getting back the original. But don't languages supporting associative arrays / dictionaries do this internally? What additional advantage is the MD5 hash giving?

martianwars
  • 6,380
  • 5
  • 35
  • 44

1 Answers1

0

Most languages may support this internally, for example see Java's hashcode(), which is used when storing keys in a HashMap:

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

But there are scenarios where you want to do it yourself.

Scenario 1 - using as key in a database:

Let's suppose you have a big no-sql-ish database of letters and metadata of these letters. You want to be able to find a letter's metadata quickly without searching. What would your index be?

One option is using a running index that's unrelated to the letter's content, but then you have to search the database before being able to find a document's metadata. Another option is to create a signature for the document composed of it's prefix (it's just an example out of many), but some documents may share this property ("Dear John,").

So how about taking into account the entire document? That's where you can use md5 as the row-key for your documents.

In this scenario you're relying on having no collisions, and the argument in favour of this assumption usually mention your chances of running into a demented gorilla being (usually) greater. The Secure Hash Algorithm family produce even less collisions.

I mention this since databases normally do not do this out of the box (frameworks may...).

Scenario 2 - One-way hash for password storage:

note: This may no longer apply for md5, but it does for the SHA-family variants.

In this scenario, you want to store passwords on your database, but storing plain-text passwords may have drawbacks if the database is compromised (user often share passwords across sites - may lead to accounts on other sites compromised as well). The usage of hashing here is storing the hashed password and when a user attempts to log-in you only compare the hash and not the password itself. This way you don't need the password stored locally and it is a lot harder to crack it.

Community
  • 1
  • 1
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88