3

I recently came across the following code sample for encrypting a file with AES-256 CBC with a SHA-256 HMAC for authentication and validation:

aes_key, hmac_key = self.keys
# create a PKCS#7 pad to get us to `len(data) % 16 == 0`
pad_length = 16 - len(data) % 16
data = data + (pad_length * chr(pad_length))
# get IV
iv = os.urandom(16)
# create cipher
cipher = AES.new(aes_key, AES.MODE_CBC, iv)
data = iv + cipher.encrypt(data)
sig = hmac.new(hmac_key, data, hashlib.sha256).digest()
# return the encrypted data (iv, followed by encrypted data, followed by hmac sig):
return data + sig

Since, in my case, I'm encrypting much more than a string, rather a fairly large file, I modified the code to do the following:

aes_key, hmac_key = self.keys
iv = os.urandom(16)
cipher = AES.new(aes_key, AES.MODE_CBC, iv)

with open('input.file', 'rb') as infile:
    with open('output.file', 'wb') as outfile:
        # write the iv to the file:
        outfile.write(iv)

        # start the loop
        end_of_line = True

        while True:
            input_chunk = infile.read(64 * 1024)

            if len(input_chunk) == 0:
                # we have reached the end of the input file and it matches `% 16 == 0`
                # so pad it with 16 bytes of PKCS#7 padding:
                end_of_line = True
                input_chunk += 16 * chr(16)
            elif len(input_chunk) % 16 > 0:
                # we have reached the end of the input file and it doesn't match `% 16 == 0`
                # pad it by the remainder of bytes in PKCS#7:
                end_of_line = True
                input_chunk_remainder = 16 - (len(input_chunk) & 16)
                input_chunk += input_chunk_remainder * chr(input_chunk_remainder)

            # write out encrypted data and an HMAC of the block
            outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, 
                    hashlib.sha256).digest())

            if end_of_line:
                break

Simply put, this reads an input file in blocks of 64KB at a time and encrypts these blocks, generating a HMAC using SHA-256 of the encrypted data, and appending that HMAC after each block. Decryption will happen by reading in 64KB + 32B chunks and calculating the HMAC of the first 64KB and comparing it against the SHA-256 sum occupying the last 32 bytes in the chunk.

Is this the right way to use an HMAC? Does it ensure security and authentication that the data was unmodified and decrypted with the right key?

FYI, the AES and HMAC keys are both derived from the same passphrase which is generated by running the input text through SHA-512, then through bcrypt, then through SHA-512 again. The output from the final SHA-512 is then split into two chunks, one used for the AES password and the other used for the HMAC.

Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411

2 Answers2

6

Yes, there are 2 security problems.

But first, I assume that with this statement at the end:

# write out encrypted data and an HMAC of the block
outfile.write(cipher.encrypt(input_chunk) + hmac.new(hmac_key, data, hashlib.sha256).digest())

you actually meant:

# write out encrypted data and an HMAC of the block
data = cipher.encrypt(input_chunk)
outfile.write(data + hmac.new(hmac_key, data, hashlib.sha256).digest())

Because data is not defined anywhere.

The 1st security problem is that you are authenticating each piece independently of the others, but not the composition. In other words, the attacker can reshuffle, duplicate, or remove any of the chunks and the receiver will not notice.

A more secure approach is to have one instance of HMAC only, pass all the encrypted data to it via the update method, and output one digest, at the very end.

Alternatively, if you want to enable the receiver to detect tampering before receiving the whole file, you can output the intermediate MAC for each piece. In fact, a call to digest does not change the state of the HMAC; you can keep calling update afterwards.

The 2nd security problem is that you don't use salt for your key derivation (I say that because you don't send it). Apart from password cracking, if you encrypt more than 2 files using the same password the attacker will also be able to freely mix chunks taken by either encrypted file - because the HMAC key is the same. Solution: use salt.

One last minor thing: infile.read(64 * 1024) may return less than 64*1024 bytes, but that does not mean you reached the end of the file.

Community
  • 1
  • 1
  • 1
    For point number 1, I need to be able to fail very quickly in decrypting the data if the password isn't right. So, I should basically create a HMAC instance before entering the loop, and dump the input data into the HMAC instance using update, then call digest to store a hash after each block, right? Then, each block will be a cumulative hash of the data before it, thus making sure that not only has a single block not been tampered with, but additionally that the entire file is in order and matches the hash. Does this patch the problem? – Naftuli Kay Feb 23 '13 at 00:03
  • 1
    For point number 2, I'm using a salt for my key derivation. The code above is oversimplified, but a salt is stored along with the file in the header to increase security. I also use a high iteration count in the salt to increase security by making it harder to brute-force the password. – Naftuli Kay Feb 23 '13 at 00:05
  • For the final point, I'll need to find a way to work around this issue somehow, as I'll be reading the file over a network socket. – Naftuli Kay Feb 23 '13 at 00:06
-2

I don't think there is a security problem with what you're doing with the HMACs (not that that means there isn't a problem with the security), but I don't know the actual value in HMAC sub elements of the ciphertext gets you. Unless you want to support partial recovery of the plaintext in the event of tampering, there is not much reason to incur the overhead of HMACing 64 KB blocks, vs the full ciphertext.

From a key generation perspective, it might make more sense to use a key generated from a passphrase to encrypt two randomly generated keys, and then use the randomly generated keys to perform HMAC and AES operations. I know using the same key for both your block cipher and HMAC is bad news, but I don't know if using a key generated in the same manner is similarly bad.

At the very least, you should tweak your key derivation mechanism. bcrypt is a password hashing mechanism, not a key derivation function. You should use PBKDF2 to do key derivations.

Peter Elliott
  • 3,273
  • 16
  • 30
  • 1
    Using bcrypt is fine. It is a key derivation function [actually better than PBKDF2](http://security.stackexchange.com/questions/4781/do-any-security-experts-recommend-bcrypt-for-password-storage). – SquareRootOfTwentyThree Feb 22 '13 at 23:39
  • @SquareRootOfTwentyThree that is a post about password hashing, which is a different problem from key derivation. bcrypt has a fixed output of 192 bits, which means you have to do extra stuff like hash the output to get something usable as a key. You don't have to do that with PBKDF2. – Peter Elliott Feb 22 '13 at 23:51
  • To me, "password hashing" means "password-based KDF", but honestly I don't care about naming. bcrypt is better in this case because it is more robust against hardware attacks than PBKDF2. Doing an extra hash at the end is trivial. What matters is that it is secure. – SquareRootOfTwentyThree Feb 23 '13 at 00:23