Salt is used to prevent a hacker from reversing the password hashes into passwords. So here we assume that somewhow the hacker has access to the database.
Without salt
Let us first assume the scenario without salt. In that case the table looks like:
user | md5 password (first 6 chars)
-------------------------------
1 | 1932ff
2 | d3b073
(we here make the situation simpler than it is in reality)
The hacker of course wants to know what the passwords behind d3b073
and 1932ff
are. A hash function is one directional in the sense that we can hash a password very fast, but unhashing it will - given it is a good hashing function - take a very long time, after guessing a huge amount of passwords.
So there is not much hope to easily retrieve the possible password(s) behind d3b073
. But we can easily find a list of the 100'000 most popular passwords, and calculate the MD5 hash of all these passwords. Such list could look like:
password | md5 (first 6 characters)
--------------------------------------------
foo | d3b073
bar | c157a7
So apparently user 2
has used foo
as password. The password of user 1
is unknown to us (but we know it is not foo
or bar
).
Now the point is that we can construct such table once and then use it to crack all passwords of all the users. Constructing such table for 100'000 passwords might perhaps take a few hours, but then we can easily retrieve all passwords. So a hacker can construct (or download) such table (there are more efficient ways, for instance with rainbow tables), and then use it each time he/she hacks a website and then obtains the passwords of all users.
With salt
If we however use salting, the table could look like this:
user | salt | hashed password
-------------------------------
1 | a91f40 | 1a604e
2 | c2a67c | b36232
So here if the password of user 2
is foo
, then we calculate the hash of fooc2a67c
(or we use another way to combine the salt and the password) and store this into the database.
The point is that it is very hard to guess the password, since b36232
is not the hash of foo
, but of fooc2a67c
and the salt is typically something (pseudo)-random. We can of course again construct the most popular 100'000 passwords with salt c2a67c
appended to it, but since we can not know the salt in advance, we can not create this table only once. Even if we are lucky and already constructed the table for salt c2a67c
, it will not help us with hacking the password of user 1
, since user 1 has a different salt.
So the only way to resolve this, is by constructing a reverse hash lookup table, for every user. Since it is usually very expensive to construct such table once, it will not be easy to calculate such table for every user.
We might of course decide to calculate all hashes of all possible salts, like for instance:
password | md5 (first 6 characters)
---------------------------------------------
foo000000 | 367390
foo000001 | eca8ea
foo000002 | 6eb7bf
foo000003 | 7906b1
foo000004 | 0e9f0c
foo000005 | 0bfb11
... | ...
But as you can see, the size of such table would grow to gigantic sizes. Furthermore it would take thousands of years. Even if we add only one hexadecimal character as salt, the size of the table would scale 16 times. Yes there are some techniques to reduce the amount of time and space for such table, but by increasing the "password space", the problem to hack passwords, will definitely be much harder. Furthermore salt is usally a signifcant amount of characters (or bytes) long making it way more harder than just 16 times more.
Basically salt acts as a way to enlarge the password space. Even if you enter the very same password on two websites, the personal salt of the websites will (close to certainty) be unique, and therefore the hash will be unique as well.