10

I have a question relating to the use of an Initialization Vector in AES encryption. I am referencing the following articles / posts to build encryption into my program:

[1] Java 256-bit AES Password-Based Encryption
[2] http://gmailassistant.sourceforge.net/src/org/freeshell/zs/common/Encryptor.java.html

I was originally following erickson's solution from the first link but, from what I can tell, PBKDF2WithHmacSHA1 is not supported on my implementation. So, I turned to the second link to get an idea for my own iterative SHA-256 hash creation.

My question comes in how the IV is created. One implementation ([1]) uses methods from the Cypher class to derive the IV where are the other ([2]) uses the second 16 bytes of the hash as the IV. Quite simply, why the difference and which is better from a security standpoint? I am kinda confused to the derivation and use of IVs as well (I understand what they are used for, just not the subtler differences), so any clarification is also very welcome.

I noticed that the second link uses AES-128 rather than AES-256 which would suggest to me that I would have to go up to SHA-512 is I wanted to use this method. This seems like it would be an unfortunate requirement as the user's password would have to be 16 characters longer to ensure a remotely secure hash and this app is destined for a cell phone.

Source is available on request, though it is still incomplete.

Thank you in advance.

Community
  • 1
  • 1
MysteryMoose
  • 2,211
  • 4
  • 23
  • 47
  • It is worth to take a loot at http://stackoverflow.com/questions/14937707/getting-incorrect-decryption-value-using-aescryptoserviceprovider/. It says about setting the `IV` during encryption. – LCJ Feb 19 '13 at 06:28

5 Answers5

28

The IV should not be generated from the password alone.

The point of the IV that even with the same key and plaintext is re-used, a different ciphertext will be produced. If the IV is deterministically produced from the password only, you'd get the same ciphertext every time. In the cited example, a salt is randomly chosen, so a new key is generated even with the same password.

Just use a random number generator to choose an IV. That's what the cipher is doing internally.


I want to stress that you have to store either the IV (if you use the first method) or a salt (if you use the second method) together with the ciphertext. You won't have good security if everything is derived from the password; you need some randomness in every message.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • 2
    Ok, that makes more sense. One more question, then: If Cypher is using a random number generator internally, how do I decrypt? As I understand it, the same IV needs to be used for both encryption and decryption. Do they simply use the key as the seed for the generator to get the same IV per key or is there something more going on? – MysteryMoose Dec 21 '10 at 23:19
  • 2
    @phobos51594 - You have to store the IV with the ciphertext. One way to think of the IV is as the first block of ciphertext, so you could store it as a prefix to the actual output of the cipher. Or whatever. It doesn't need to be kept secret; just store it with the rest of the ciphertext. – erickson Dec 21 '10 at 23:24
  • 2
    Ignore the previous comment, I just saw your edit. That makes even more sense. So the pattern would go something along the lines of: Hash password, generate IV using Cypher or a secure random feature, encrypt with hash-IV-salt, store previous values, decrypt with stored values, discard IV, celebrate, repeat. – MysteryMoose Dec 21 '10 at 23:25
  • As phobos51594 noted, link [1] didn't work on Android due to "PBKDF2WithHmacSHA1" algorithm causing NoSuchAlgorithmException but "PBEWITHSHA256AND128BITAES-CBC-BC" seems working ok. – bob Jun 13 '11 at 04:31
4

My understanding is that Initialization Vector is just random input to encryption algorithm, otherwise you would always get same result for same input. Initialization Vector is stored together with cipher text, it's not secret in any way. Just use secure random function to generate initialization vector. PBKDF* algorithms are used to derive secret keys of desired length for encryption algorithms from user-entered passwords.

First implementation that you link to simply lets Cipher object to generate Initialization Vector. Then it fetches this generated IV to store it together with cipher text.

Second one uses part of hash bytes. Any approach that generates non-repeating IVs is good enough.

Most important property of IV is that it doesn't repeat (very often).

Peter Štibraný
  • 32,463
  • 16
  • 90
  • 116
  • 1
    I am assuming that I will have to store the IV unless I derive it from the original key somehow? So the first implementation is actually more similar to the second than I thought in that they both (somehow) get an IV from a given key? In that case couldn't I just use the key as the IV? – MysteryMoose Dec 21 '10 at 22:45
  • 2
    Well, the key has to remain secret. The IV is public. You don't want to use the key as your IV, or you'll be broadcasting your secret key to everyone! For the same reason, but to a lesser extent, this is why you don't want to have your IV based on the key using hashing. (Like [2] does) Since if the hashing algorithm has a weakness discovered in it, an attacker can derive your key from the IVs. – AltF4 Dec 21 '10 at 22:56
  • Ok, that makes sense. So [2] is really just bad practice. – MysteryMoose Dec 21 '10 at 23:14
  • @phobos51594: It is certainly bit strange... [2] uses key + salt to generate hash, and extracts IV from hash. Now the problem of randomness depends on salt. If someone passes same salt each time, they will get same IV, which *is a problem*. Problem of [2] is that now you have to remember salt, as well as IV :-) – Peter Štibraný Dec 22 '10 at 12:49
  • In [2], since the IV is derived entirely from the salt+password you wouldn't need to store the IV. You could just store the salt and combine it with the password at runtime to generate the IV. But this design makes the strength of the IV tantamount to the strength of the salt. And I don't think I see in the code any length guarantees of the salt. It could be very small depending on how its used. – AltF4 Dec 23 '10 at 15:14
  • @AltF4: yes, you are right. If you remember the salt, you can derive IV. – Peter Štibraný Dec 23 '10 at 15:17
4

Cryptographers should generate IVs using a secure pseudo-random random number generator.

Application developers should use existing, off the shelf cryptography. I suggest that you use SSL with certificates to secure your network traffic and GPG to secure file data.

There are so many details that can make an implementation insecure, such as timing attacks. When an application developer is making decisions between AES 128 and AES 256 it is nearly always pointless since you've likely left a timing attack that renders the extra key bits useless.

Spike Gronim
  • 6,154
  • 22
  • 21
  • 1
    I didn't mention this in the original post, but this is not for network traffic of any sort. I am going to be encrypting a file that will never leave the device unless the entire system is compromised. – MysteryMoose Dec 21 '10 at 22:42
  • so you're putting an encryption key and an encrypted file on the same device? That's pointless. – Spike Gronim Dec 22 '10 at 20:58
  • Are timing attacks really an issue in networked applications. I would assume that network speed variability would wipe out that information. If not a simple sleep until Xms past encrypt start would fix this. – Keynan Sep 25 '13 at 07:39
1

The IV is just a consequence of the use of block chaining. I presume that this is more than a simple API design question. I assume that you know that the reasoning for using it is so that the same plaintext will not show up as the same ciphertext in multiple blocks.

Think about recursion from the last block where the Nth ciphertext block depends in some way on the (N-1)th block, etc. When you get to the first block, 0th block, you need some data to get started. It doesn't matter what that data is as long as you know it before you attempt to decrypt. Using non-secret random data as an initialization vector will cause identical messages encrypted under the same key to come out as completely different ciphertext.

It's similar in concept to salting a hash. And that source code looks a little fishy to me. An IV should simply be fresh-at-encryption-time random bits dependent upon nothing, like a nonce. The IV is basically part of the encrypted message. If you re-encrypt identical data with identical key, you should not be able to correlate the messages. (Hey, think about the consequences of correlating by ciphertext length as well.)

Rob
  • 1,387
  • 1
  • 13
  • 18
  • That bit makes sense. My confusion comes in generating an IV to make the best encryption possible (within practicality). – MysteryMoose Dec 21 '10 at 22:47
  • Trying to get around the burden of transmitting the IV with the encrypted data defeats the purpose of having an IV. Using an IV that's derived from a secret is missing the point, it defeats the purpose of it actually. The point is that the IV is *fresh* and *unpredictable*, so that previously encrypted messages cannot be correlated. – Rob Dec 22 '10 at 05:13
  • sometimes new IV is computed as (previous IV) + 1, so it can be predictable. I am not sure if this introduces some problems or not though. – Peter Štibraný Dec 22 '10 at 12:43
  • @Peter, I see what you mean. But you still need to transmit the IV as part of the message, implicitly or explicitly, even if this is only done on the first of a series of encryptions in a session. However, think for a moment about giving this advice in the absence of deep details on the particular cipher. With predictable IVs, the chances of being able to correlate messages would be certainly higher (or the same) than for random IVs. – Rob Dec 29 '10 at 16:59
  • Puzzle: 2 instances of my cipher with the same key, using IVs that start at 0 for each instance and increment. If I always encoded the same small number of messages, there would be correlations. If there were only one instance of it, then incrementing maximizes the period until it's known to repeat. So I guess the determining factor would be the probability of generating a collision if you used randomly generated IVs. – Rob Dec 29 '10 at 17:09
  • @Rob. Using a IV that simply increments per block is a known and currently-known-to-be-secure method. See the "Counter" encryption mode here: http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation – Ch'marr Aug 05 '13 at 20:24
1

As with everyone else here, I've always known IVs to be just chosen randomly using the standard algorithms for doing so.

The second reference you provided, though, doesn't seem to be doing that. Looks like he salts a password and hashes it. Then takes that hash and splits it up into halves. One half is the encryption key, one is the IV. So the IV is derived from the password.

I don't have any strong breaks for such a method, but it's bad design. The IV should be independent and random all on its own. Maybe if there's a weakness in the hashing algorithm or if you choose a weak password. You don't want to be able to derive the IV from anything else or it's conceivable to launch pre-computation attacks.

AltF4
  • 607
  • 6
  • 13