752

Coda Hale's article "How To Safely Store a Password" claims that:

bcrypt has salts built-in to prevent rainbow table attacks.

He cites this paper, which says that in OpenBSD's implementation of bcrypt:

OpenBSD generates the 128-bit bcrypt salt from an arcfour (arc4random(3)) key stream, seeded with random data the kernel collects from device timings.

I don't understand how this can work. In my conception of a salt:

  • It needs to be different for each stored password, so that a separate rainbow table would have to be generated for each
  • It needs to be stored somewhere so that it's repeatable: when a user tries to log in, we take their password attempt, repeat the same salt-and-hash procedure we did when we originally stored their password, and compare

When I'm using Devise (a Rails login manager) with bcrypt, there is no salt column in the database, so I'm confused. If the salt is random and not stored anywhere, how can we reliably repeat the hashing process?

In short, how can bcrypt have built-in salts?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Nathan Long
  • 122,748
  • 97
  • 336
  • 451

5 Answers5

946

This is bcrypt:

Generate a random salt. A "cost" factor has been pre-configured. Collect a password.

Derive an encryption key from the password using the salt and cost factor. Use it to encrypt a well-known string. Store the cost, salt, and cipher text. Because these three elements have a known length, it's easy to concatenate them and store them in a single field, yet be able to split them apart later.

When someone tries to authenticate, retrieve the stored cost and salt. Derive a key from the input password, cost and salt. Encrypt the same well-known string. If the generated cipher text matches the stored cipher text, the password is a match.

Bcrypt operates in a very similar manner to more traditional schemes based on algorithms like PBKDF2. The main difference is its use of a derived key to encrypt known plain text; other schemes (reasonably) assume the key derivation function is irreversible, and store the derived key directly.


Stored in the database, a bcrypt "hash" might look something like this:

$2a$10$vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa

This is actually three fields, delimited by "$":

  • 2a identifies the bcrypt algorithm version that was used.
  • 10 is the cost factor; 210 iterations of the key derivation function are used (which is not enough, by the way. I'd recommend a cost of 12 or more.)
  • vI8aWBnW3fID.ZQ4/zo1G.q1lRps.9cGLcZEiGDMVr5yUP1KUOYTa is the salt and the cipher text, concatenated and encoded in a modified Base-64. The first 22 characters decode to a 16-byte value for the salt. The remaining characters are cipher text to be compared for authentication.

This example is taken from the documentation for Coda Hale's ruby implementation.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • 7
    Would you have more details as to why cost factor of 10 would not be enough? In Grails, I noticed that 10 is the default value for cost factor/log rounds for bcrypt so it might be worth updating given your suggestion. – pm_labs May 14 '12 at 22:52
  • @erickson a cost of 16 takes almost 10 seconds on my host .. I can't have that every time someone's logging in. Is my hardware really just that crappy? – Explosion Pills Jun 21 '12 at 00:54
  • @ExplosionPills what language and library are you using? If an interpreter is executing most of the algorithm it could appear much slower. – erickson Jun 21 '12 at 02:26
  • @erickson `php` with `crypt` (`CRYPT_BLOWFISH`). Am I screwed? – Explosion Pills Jun 21 '12 at 02:43
  • @ExplosionPills I'm afraid I don't know much about PHP. From what I understand, most PHP deployments nowadays use pre-compiled byte-code, which should be fairly fast. Do you know whether this `crypt()` function is implemented by more PHP code, or it is actually implemented by a compiled library (written in C or C++ and compiled for the host architecture)? If the latter, it seems like it should be fast. – erickson Jun 21 '12 at 17:19
  • @ExplosionPills I've done some benchmarking, and the `bcrypt` key generation algorithm *is* very slow, much slower than the PBKDF2 with SHA-1 HMAC I had assumed was comparable when I recommended a cost factor. `bcrypt` with a cost factor of 16 takes about 5.6 seconds on my machine. In that time, I can do almost 4 million PBKDF2 iterations. So, the cost factor for bcrypt could be safely lowered, maybe to 12. This should take less than 1 second, even on your host. – erickson Jun 21 '12 at 19:04
  • 62
    The cost factor for bcrypt is exponential, or rather, a cost factor of 10 means 2^10 rounds (1024), a cost factor of 16 would mean 2^16 rounds (65536). It's natural then that it would take 5-10 seconds. It should take about 64 times as long as a cost factor of 10 does. To clear up other misinformation, PHP's crypt function uses the unix crypt library which is implemented in c. – thomasrutter Jul 03 '12 at 13:02
  • @thomasrutter Thanks for answering my question about PHP's `crypt()` implementation. I am not aware of any "misinformation" though. Can you clarify? – erickson Jul 03 '12 at 15:48
  • Sorry, just referring to your speculation that crypt may be implemented as PHP and executed as bytecode, which it isn't (it calls a C library). I realise you weren't stating that as fact though because you weren't sure. Just clearing it up. – thomasrutter Jul 04 '12 at 03:23
  • The mode of operation makes sense, but if I have access to the database and can generate a password for me (in Devise), it seems I can now copy that to any other users encrypted password field entry and I "know" their password. What am I missing? – TJChambers Aug 06 '14 at 22:33
  • 4
    @TJChambers That's right; if you can set the password on the account, you will be able to authenticate. Password hashing isn't intended to prevent that attack. It's meant to prevent an attacker with read-only access to the password table from authenticating. For example, you get a backup tape with the table on it. – erickson Aug 06 '14 at 22:38
  • thanks @erickson. I guess the theory for me (if I understand it correctly) is if someone breaks in far enough to get access to the password database itself and can write to the database, AND they can establish their own password to a known account (their own), then they can access anyone elses account by copying their encrypted password to that account and logging in. At least until the other party realizes they cannot log in and resets their password. It seems like including the primary key (i.e. username or email) into the embedded salt process would eliminate that. – TJChambers Aug 07 '14 at 23:28
  • I have to assume the attacker can compute hashes with the same known inputs. Otherwise I'm relying on "security through obscurity" rather than strong cryptography. To thwart an attacker who can write to the table, I'd need to be able to hide some master secret that could be used to authenticate information in the table. For example, compute a digital signature or MAC on the table or its entries. And keeping the master secret hidden from such an attacker would be hard. – erickson Aug 07 '14 at 23:41
  • If I understand you correctly, @erickson, then it would require an "unknown" input to prevent access. Or at least an unknown verifying outcome (like a digital signature), to at least detect the intrusion. – TJChambers Aug 11 '14 at 15:22
  • @TJChambers That's right. But most password authentication systems don't incorporate this, because if an attacker has write access, you've already lost. – erickson Aug 11 '14 at 16:46
  • @erickson Question with reference to this statement-- "The main difference in its use of a derived key to encrypt known plain text; other schemes (reasonably) assume the key derivation function is irreversible, and store the derived key directly." So is it accurate to say that breaking bcrypt would depend on somehow breaking through two algorithms-- both the KDF and the cipher, whereas something like PBKDF2 only requires the KDF to be broken? And therefore bcrypt is *arguably* more secure in that sense? – temporary_user_name Oct 29 '16 at 07:02
  • @Aerovistae Yes, bcrypt can only be "broken" (in the sense of there being a shortcut to recover a password) if the cipher and the key derivation function are each broken. – erickson Oct 30 '16 at 00:45
  • 3
    Isn't storing the salt with the chiper bad security? If someone gets their hand on the hash, with enough computing it can be cracked. If he doesn't know the salt it's virtually impossible. – LobsterMan Jan 02 '18 at 11:52
  • 19
    @LobsterMan No, not really. If you could keep a secret, you wouldn't use this approach, you'd just store the password. Password authentication schemes are based on the assumption that the attacker has discovered everything that you know. The salt is there to require each password to be attacked individually. The computational effort required for testing passwords is governed by the iterations. If users pick good passwords, they will be secure, even if when the salt is revealed. Hiding the salt could help someone with a bad password in some cases, but I'd work on password quality first. – erickson Jan 02 '18 at 13:20
  • I still don't get it. What is ''well known string" here? – NLV Feb 22 '18 at 18:28
  • 2
    @NLV It's a string defined in the bcrypt specification: `"OrpheanBeholderScryDoubt"` – erickson Feb 22 '18 at 18:48
  • 1
    Ah! It all makes sense now. Also this helped - https://crypto.stackexchange.com/a/41958 – NLV Feb 22 '18 at 19:04
  • Putting in plain English something that was not obvious to me: when you run the encryption function twice, you'll get different result. It is totally randomized. So one cannot build a dictionary of known encrypted password to compare. Encrypting "password" can give me `$2b$10$JV0P0SrhI5RwMnsQPLv3u.GpNamTdiDYCRIH/mM24lGq1Z89i5qty`, but then `$2b$10$jbxd1Y6RniQ0AXVYJrRIqu2rXLMELfU0Gw7bbiN42/CzANIgRAg1O` for example. – Eric Burel Mar 01 '21 at 17:28
205

I believe that phrase should have been worded as follows:

bcrypt has salts built into the generated hashes to prevent rainbow table attacks.

The bcrypt utility itself does not appear to maintain a list of salts. Rather, salts are generated randomly and appended to the output of the function so that they are remembered later on (according to the Java implementation of bcrypt). Put another way, the "hash" generated by bcrypt is not just the hash. Rather, it is the hash and the salt concatenated.

Adam Paynter
  • 46,244
  • 33
  • 149
  • 164
  • Related: http://stackoverflow.com/questions/5881169/storing-a-hashed-password-bcrypt-in-a-database-type-length-of-column/5882472#5882472 – Gumbo Jul 26 '11 at 15:40
  • 33
    OK, so I sign up for a site and choose a the password "foo". `Bcrypt` adds a random salt of "akd2!*", resulting in "fooakd2!*", which is hashed and stored. Later, I try to sign in with password "bar". To see if I'm correct, it needs to hash "barakd2!*". If the salt was generated randomly to start with, how does it know how to add it back to "bar" before hashing and comparing? – Nathan Long Jul 26 '11 at 15:40
  • 57
    @Nathan: `bcrypt` knows how to extract the salt back out of the generated output (which is stored in the database). When it comes time to authenticate, `bcrypt` separates the original output into its hash and salt components. The salt component is applied to the incoming password typed by the user. – Adam Paynter Jul 26 '11 at 15:46
  • 33
    To answer Nathan Long's comment, a good way of thinking of this is that salts are not meant to be secret. This is why the salt is included in the output from the bcrypt function as one of the answers pointed out above. The salt is there to prevent rainbow tables, which are lists of common passwords, or just brute force, etc... of different passwords but hashed. Without salt, the hash for a password in database A would be the same as a hash for a password in database B. Salt merely changes up the hash values making it harder for someone who stole the database to decrypt (unhash) passwords. – Joseph Astrahan Jan 05 '16 at 16:28
  • 12
    @Nathan but can an attacker just remove the known salts in all passwords and then create a table with them? – Oscar Aug 11 '16 at 22:24
  • 1
    I am waiting for reply on @Oscar comment. Also I still didn't get it the final statement. So, while matching the incoming password, we just need to call the same function (used while encrypting) and it will generate the same hashed password again? Is this correct statement or am I missing something? – amarmishra Jan 11 '17 at 10:31
  • @amarmishra see Joseph's comment: the password is still stored with the salt applied and the salt is a random value for each and every user. That means that an attacker cannot use a pre-existing Rainbow Table. He/She would have to brute force each and every password separately. – Korgen Feb 10 '17 at 11:51
  • 1
    But say a user is a president, and his salt is known, a rainbow table can be generated for that salt specifically, right? It would take very long of course, but a secret-ish salt sounds better to me in this case. – Stephan Bijzitter Mar 14 '17 at 21:26
  • 1
    You don't need to build a rainbow table to crack one password. If you want to store a salt somewhere else than in your database, go ahead and add your own second salt. Storing it inside the same database as the passwords is pointless, though. – Emil Vikström Jun 26 '17 at 09:10
  • @JosephAstrahan if someone has stolen the database, then they could look into the hashes and use the correct salts for their rainbow attack? Or am I missing something? – Justus Romijn Aug 04 '17 at 19:47
  • Ah nevermind. They point is: a rainbow table is a list of encrypted passwords (without salt), so it is a precomputed list of hashes. By using salts, these tables are worthless. However, with enough computing power, and when they have your database, they can still generate encryption values based on common passwords plus the salts that are now known. So it is a lot more effort to crack, so it buys you some time to mitigate the issue and inform affected users. – Justus Romijn Aug 04 '17 at 19:52
  • 4
    This is how I understand it: The idea is that every password has an unique salt. The salt in incorporated in the password hash so a hacker would have to create a rainbow table for every password. This would take an enormous amount of time for a moderate database. It's all about slowing an attacker down and thus making brute-forcing pointless. – PVermeer Apr 02 '18 at 14:52
  • 6
    What stops the hacker from just removing the salt from the hash, and just compare his unsalted hash with it? – Kevin Frostad Sep 11 '18 at 12:19
  • @KevinFrostad The way I understand it, this is not possible because the hash is not a concatenation of the hash and salt. The salt itself is hashed (i.e. the concatenation process of adding the salt to the password happens before the hashing). To put it another way, even if the salt is 'salt', chances are near zero that the string 'salt' will be present in the hash of the salted password (and if it is, it is random, not the actual salt). – Hemal Sep 24 '18 at 12:13
  • 1
    @Hemal If you run this relp.it [link](https://repl.it/@frokev/pswd-hashing-bcrypt) you can see that bcrypt actually just adds the generated salt directly to the hash. You can see that because the start of the two different hash strings with the same salt (username and password) are exactly the same for both when you generate a salt. If it were mixed in with the hash the output should have been different when the hash is. But it's not, only the end of the string containing the hash is :s – Kevin Frostad Nov 15 '18 at 14:16
  • @KevinFrostad You're right - It was a while ago now (I've been reading a fair bit on it since) so I don't know whether I was straight up misunderstanding things when I initially answered that or whether I phrased it badly - apologies either way. But to go back to your original qn: What stops the hacker from just removing the salt from the hash, and just compare his unsalted hash with it? Please correct me if I am wrong (I am unsure of what 'it' refers to), but I am understanding your question to be a comparison of the salted hash vs the unsalted hash right?... 1/2 – Hemal Nov 19 '18 at 03:57
  • 1
    2/2 If a db of bcrypt passwords was hacked, the hacker would have, at that point in time, the salted hash - from which he can extract the salt (as your REPL shows) and the hashed password. However, he would not have the password itself, nor would he have the hash of the password without the salt - i.e. There would be nothing for him to compare the salted hash to that would lead to the original password used to create the hash. To put it another way - the salt + pwd = salted hash. The hashing is a one way function that does not allow the hacker to generate the pwd, even with the salt. – Hemal Nov 19 '18 at 04:04
  • @Hemal I understand the salt is there to prevent Rainbow Table attacks against the hashed passwords in the database. But when the salt is attached to the front of the hash, the hacker can compare his hashed Rainbow Table against the bcrypt hash in the database to find out that a substring of the bcrypt hashes contains the real hash output for the password... 1/2 – Kevin Frostad Dec 01 '18 at 18:08
  • 2/2 In other words, because the salt is attached to the hash output it won't take much effort for the hacker to pinpoint the password of the output from his Rainbow Table, which contains unsalted bcrypt hashes of common passwords. He just needs to substring away the salt. It might take some iterations but doesn't seem to be a major factor in delaying the attempt at fetching the password. And the likelihood that all the salts will have the same length is probably high. Then he can just substring away the salt for every single password once he has found the first match. – Kevin Frostad Dec 01 '18 at 18:09
  • @Hemal Just as an example. If I had the bcrypt hashes for 3 of the most commonly used password, than the likelihood that I can find a match against a substring of the first 1000 hashes with the attached salts is good, and almost certain against the first 10 000. Now, if the length of every generated salt string is the same as the one I found in my first match against one of the most commonly used passwords. Then I just need to substring away the first part of every password from the database which I know contains the salt. – Kevin Frostad Dec 01 '18 at 18:26
  • Kevin, the hash is generated from the salt + password + key, so it would be different for different passwords even with the same salt. That means you can't come to bat with a hash handy for the three most commonly used passwords -- that doesn't exist. You would need to crack a new hash using the known salt on the fly, but you need a third element: the encryption key, which is a "strong password" in its own right. The random salt adds a layer of difficulty in that you can't come to bat with pre-hashed passwords. You have to hash all the passwords to try for each salt, which is o(n^2). – Alex Weavers Dec 13 '18 at 02:10
  • 2
    @KevinFrostad You found a list of password hashes. "123456" is a very common password, you generate its hash, try it against every entries: instant access to a bunch of accounts. With salt, all the crappy "123456" would actually be altered to be prefixed with a unique salt (the plain text password, not the hash). You now need to generate the SHA1 hash for every entry ("ak5t123456", "j3rt123456", "7w8d123456"). You need more time to get access to that same bunch of accounts, giving the service more time to find out and react about the breach. – Robin Goupil Jan 19 '19 at 07:50
  • @KevinFrostad Protecting the salt, while maybe adding another layer of security, will slow down the authentication process, but it will not necessarily slow down the attacker (which might also encourage some to reduce the work factor). You can still provide your own salt to bcrypt AFAIK. However, I am not sure that the cost of protecting the salt out-weight the cost of not protecting it - it will all go to waste if the auth server get compromised, while the salt's purpose did not. – Robin Goupil Jan 19 '19 at 07:58
  • @JamesDev I see. I thought hashing algorithms like bcrypt was supposed to be so secure that you could store them for anyone to see. I guess a password is only as strong as it's cleartext version. Finding the cleartext of 123456 would take me only a few seconds or so if I figured out the salts length; which would probably only take a few hours, assuming the salt didn't have the deafult length of 29 characters. From there on finding weak passwords would only take as long as the hashing speed (and comparing the hash for matches ofc). – Kevin Frostad Jan 19 '19 at 22:34
  • 2
    @KevinFrostad knowing the salt's length doesn't reduce the level of effort in any meaningful way. you still have to attack each password individually. and each successful attack is independent of the next one. you don't gain any advantage by cracking one weak password, you've merely cracked one weak password. – worc Mar 25 '19 at 18:54
18

This is a simple terms...

Bcrypt does not have a database it stores the salt...

The salt is added to the hash in base64 format....

The question is how does bcrypt verifies the password when it has no database...?

What bcrypt does is that it extract the salt from the password hash... Use the salt extracted to encrypt the plain password and compares the new hash with the old hash to see if they are the same...

Manomite
  • 199
  • 1
  • 5
5

To make things even more clearer,

Registeration/Login direction ->

The password + salt is encrypted with a key generated from the: cost, salt and the password. we call that encrypted value the cipher text. then we attach the salt to this value and encoding it using base64. attaching the cost to it and this is the produced string from bcrypt:

$2a$COST$BASE64

This value is stored eventually.

What the attacker would need to do in order to find the password ? (other direction <- )

In case the attacker got control over the DB, the attacker will decode easily the base64 value, and then he will be able to see the salt. the salt is not secret. though it is random. Then he will need to decrypt the cipher text.

What is more important : There is no hashing in this process, rather CPU expensive encryption - decryption. thus rainbow tables are less relevant here.

jony89
  • 5,155
  • 3
  • 30
  • 40
0

Lets imagine a table that has 1 hashed password. If hacker gets access he would know the salt but he will have to calculate a big list for all the common passwords and compare after each calculation. This will take time and he would have only cracked 1 password.

Imagine a second hashed password in the same table. The salt is visible but the same above calculation needs to happen again to crack this one too because the salts are different.

If no random salts were used, it would have been much easier, why? If we use simple hashing we can just generate hashes for common passwords 1 single time (rainbow table) and just do a simple table search, or simple file search between the db table hashes and our pre-calculated hashes to find the plain passwords.

Khalifa
  • 107
  • 1
  • 4