Why can't we just use a hash of passphrase as the encryption key (and IV) with symmetric encryption algorithms?

Question

Inspired by my previous question, now I have a very interesting idea: Do you really ever need to use Rfc2898DeriveBytes or similar classes to "securely derive" the encryption key and initialization vector from the passphrase string, or will just a simple hash of that string work equally well as a key/IV, when encrypting the data with symmetric algorithm (e.g. AES, DES, etc.)?

I see tons of AES encryption code snippets, where Rfc2898DeriveBytes class is used to derive the encryption key and initialization vector (IV) from the password string. It is assumed that one should use a random salt and a shitload of iterations to derive secure enough key/IV for the encryption. While deriving bytes from password string using this method is quite useful in some scenarios, I think that's not applicable when encrypting data with symmetric algorithms! Here is why: using salt makes sense when there is a possibility to build precalculated rainbow tables, and when attacker gets his hands on hash he looks up the original password as a result. But... with symmetric data encryption, I think this is not required, as the hash of password string, or the encryption key, is never stored anywhere. So, if we just get the SHA1 hash of password, and use it as the encryption key/IV, isn't that going to be equally secure?

What is the purpose of using Rfc2898DeriveBytes class to generate key/IV from password string (which is a very very performance-intensive operation), when we could just use a SHA1 (or any other) hash of that password? Hash would result in random bit distribution in a key (as opposed to using string bytes directly). And attacker would have to brute-force the whole range of key (e.g. if key length is 256bit he would have to try 2^256 combinations) anyway.

So either I'm wrong in a dangerous way, or all those samples of AES encryption (including many upvoted answers here at SO), etc. that use Rfc2898DeriveBytes method to generate encryption key and IV are just wrong.

Have you read RFC 2898? It's a good place to start to understand why it was chosen to implement key derivation this way. http://www.ietf.org/rfc/rfc2898.txt — JamieSee, Sep 21 '12 at 15:46
It's OK to ask yourself these questions. What's not so ok is posting your ideas verbatim to stackoverflow. This is clearly off topic. Furthermore, PBKDF2 is part of *password based encryption*. Now if this did not make sense, don't you think somebody would have noticed by now? Did you actually read the RFC, so clearly indicated by the `Rfc2898DeriveBytes` class? — Maarten Bodewes, Sep 21 '12 at 21:50

score 1 · Answer 1 · answered Sep 21 '12 at 15:36

If you use the hash of the password as the encryption key, it would be a different key for each encryption which makes it easier for an attacker to attack at least some encrypted values very easily (by hashing "password123", "chocolate" etc). The best way to secure symmetric encryption is a long and single key which is managed so it cannot be sniffed by anyone.
As to your first question, about using Rfc2898DeriveBytes, the point of this is to ensure that the key you derive from your passwords is done in a way that is hard to copy or brute force. It adds entropy into the process. If you took a password and then performed a simple hash, the password used for encryption would be both weak (restricted character set and length) and predictable.

score 1 · Accepted Answer · answered Sep 21 '12 at 16:16

And attacker would have to brute-force the whole range of key (e.g. if key length is 256bit he would have to try 2^256 combinations)

This is where you've gone wrong. If weak passwords have, say 8 characters, and there are roughly 5 bits per ASCII character, then there are around (2^5) ^ 8 weak passwords, which is about 2^40. Since you're not using a salt, there are therefore only 2^40 possible keys. These keys are easy to generate by iterating through the possible combinations of 8 characters and hashing each of them. That is considerably easier to brute-force than 2^256.

score 0 · Answer 3 · answered Sep 21 '12 at 15:29

0

The point of Rfc2898DeriveBytes is to be slower.
By repeating the hash 1,000 or more times, you force brute-force attempts to guess the password to be orders of magnitude slower.

answered Sep 21 '12 at 15:29

SLaks

868,454
176
1,908
1,964

Yes but that's not required with symmetric algorithms at all! Attacker would brute force the key itself, not the password string! So I don't see any reason to slow down the derivation of key from password, because attacker would try to brute-force the key itsel, not the password it is derived from. Isn't that a case? – TX_ Sep 21 '12 at 15:33
1

@TX_: The size of a password is much less than the size of a symmetric key. It makes much more sense to brute-force the password. – SLaks Sep 21 '12 at 15:53
1

@TX_ If you add enough repetitions so that it takes 0.1 second to repeatedly hash the password, then a brute force attacker can try a maximum of 10 possible passwords per second. That will greatly increase the time it takes the attacker to work through the whole list of likely passwords. You won't stop her, but you will slow her down by a significant amount. Salting means that she has to repeat the entire password search for each different salt. RFC2898 is there for a good reason. See http://www.ietf.org/rfc/rfc2898.txt – rossum Sep 21 '12 at 20:55

Why can't we just use a hash of passphrase as the encryption key (and IV) with symmetric encryption algorithms?

3 Answers3