Iterate over hash function though it reduces search space

Question

I was reading this article regarding the number of times you should hash your password

A salt is added to password before the password is hashed to safeguard against dictionary attacks and rainbow table attacks.

The commentors in the answer by ORIP stated

hashing a hash is not something you should do, as the possibility of hash collision increase with each iteration which may reduce the search space (salt doesn't help), but this is irrelevant for password-based cryptography. To reach the 256-bit search space of this hash you'd need a completely random password, 40 characters long, from all available keyboard characters (log2(94^40))

The answer by erickson recommended

With pre-computation off the table, an attacker has compute the hash on each attempt. How long it takes to find a password now depends entirely on how long it takes to hash a candidate. This time is increased by iteration of the hash function. The number iterations is generally a parameter of the key derivation function; today, a lot of mobile devices use 10,000 to 20,000 iterations, while a server might use 100,000 or more. (The bcrypt algorithm uses the term "cost factor", which is a logarithmic measure of the time required.)

My questions are

1) Why do we iterate over the hash function since each iteration reduces the search space and hence make it easier to crack the password

2) What does search space mean ??

3) Why is the reduction of search space irrelevant for password-based cryptography

4) When is reduction of search space relevant ??

.

You can't believe half of what you read about encryption. And the other half is wrong. — Hot Licks, Dec 30 '13 at 18:34

score 3 · Accepted Answer · answered Dec 30 '13 at 18:29

Let's start with the basic question: What is a search space?

A search space is the set of all values that must be searched in order to find the one you want. In the case of AES-256, the total key space is 2^256. This is a really staggeringly large number. This is the number that most people are throwing around when they say that AES cannot be brute forced.

The search space of "8-letter sequences of lowercase letters" is 26^8, or about 200 billion (~2^37), which from a cryptographic point of view is a tiny, insignificant number that can be searched pretty quickly. It's less than 3 days at 1,000,000 checks per second. Real passwords are chosen out of much smaller sets, since most people don't type 8 totally random letters. (You can up this with upper case and numbers and symbols, but people pick from a tiny set of those, too.)

OK, so people like to type short, easy passwords, but we want to make them hard to brute-force. So we need a way to convert "easy to guess passwords" into "hard to guess key." We call this a Key Derivation Function (KDF). We need two things for it:

The KDF must be "computationally indistinguishable from random." This means that there is no inverse of the hash function that can be computed more quickly than a brute force search.
The KDF should take non-trivial time to compute, so that brute forcing the tiny password space is still very difficult. Ideally it should be made as difficult as brute forcing the entire key space, but it is rare to push it that far.

The first point is the answer to your question of "why don't we care about collisions?" It is because collisions, while they could possibly exist, cannot be predicted in an computationally efficient manner. If collisions could be efficiently predicted, then your KDF function is not indistinguishable from random.

A KDF is not the same as just "repeated hashing." Repeated hashing can be distinguished from random, and is subject to significant attacks (most notably length-extension attacks).

PBKDF2, as a specific KDF example, is proven to be computationally indistinguishable from random, as long as it is provided with a pseudorandom function (PRF). A PRF is defined as itself being computationally indistinguishable from random. PBDFK2 uses HMAC, which is proven to be a PRF as long as it is provided a hashing function that is at least weakly collision resistant (the requirement is actually a bit weaker than even that).

Note the word "proven" here. Good cryptography lives on top of mathematical security proofs. It is not just "tie a lot of knots and hope it holds."

So that's a little tiny bit of the math behind why we're not worried about collisions, but let's also consider some intuition about it.

The total number of 16-character (absurdly long) passwords that can be easily typed on a common English keyboard is about 95^16 or 2^105 (that doesn't count the 15, 14, 13, etc length passwords, but since 95^16 is almost two orders of magnitude larger than 95^15, it's close enough). Now, consider that for each password, we're going to randomly map it to 10,000 intermediate keys (via 10,000 iterations of PBKDF2). That gets us up to 2^118 random choices that we hope never collide in our hash. What are the chances?

Well, 2^256 (our total space) divided by 2^118 (our keys) is 2^138. That means we're using much less than 10^-41 of the space for all passwords that could even be remotely likely. If we're picking these randomly (and the definition of a PRF says we are), the chances of two colliding are, um, small. And if two somehow did, no attacker would ever be able to predict it.

Take away lesson: Use PBKDF2 (or another good KDF like scrypt or bcrypt) to convert passwords into keys. Use a lot of iterations (10,000-100,000 at a minimum). Do not worry about the collisions.

You may be interested in a little more discussion of this in Brute-Forcing Passwords.

score 1 · Answer 2 · answered Dec 30 '13 at 17:37

As the second snippet said, each iteration makes each "guess" a hacker makes take longer, therefore increasing the total time it will take then to crack an average password.
Search space is all the possible hashes for a password after however many iterations you are using. Each iteration decreases the search space.
Because of #1, as the size of the search space decreases, the time to check each possibility increases, balancing out that negative effect.
According to the second snippet, answers #1 and #3 say it actually isn't.

I hope this makes sense, it's a very complicated topic.

score 1 · Answer 3 · answered Dec 30 '13 at 17:40

The reason to iterate is to make it harder for an attacker to brute force the hash. If you have a single round of hashing for a value, then in order to precompute a table for cracking that hash, you need to do 1 * keyspace hashes. If you do 1000 hashes of the value, then it would require the work of 1000 * keyspace.

Search space generally refers to the total number of combinations of characters that could make up a password.

I would say that the reduction of search space is irrelevant because passwords are generally not cracked by attempting 0000000, then 0000001, etc. They are instead attempted to be cracked by using dictionaries and combinatorics. There is essentially a realm of passwords that are likely to get cracked (like "password", "abcdef1", "goshawks", etc.), but creating a larger work factor will make it much more difficult for an attacker to hit all of the likely passwords in the space. Combining that with a salt, means they have to do all of the work for those likely passwords, for every hash they want to crack.

The reduction in search space becomes relevant if you are trying to crack something that is random and could take up any value in the search space.

Iterate over hash function though it reduces search space

3 Answers3