I've been trying to figure out the best way to accomplish the task of encrypting big (several GB) files into the file system for later access.
I've been experimenting with several modes of AES (particularly CBC and GCM) and there are some pros and cons I've found on each approach.
After researching and asking around, I come to the conclusion that at least at this moment, using AES+GCM is not feasible for me, mostly because of the issues it has in Java and the fact that I can't use BouncyCastle.
So I am writing this to talk about the protocol I'm going to be implementing to complete the task. Please provide feedback as you see fit.
Encryption
- Using AES/CBC/PKCS5Padding with 256 bit keys
- The file will be encrypted using a custom CipherOutputStream. This output stream will take care of writing a custom header at the beginning of the file which will consist of at least the following:
- First few bytes to easyly tell that the file is encrypted
- IV
- Algorithm, mode and padding used
- Size of the key
- The length of the header itself
- While the file is being encrypted, it will be also digested to calculate its authentication tag.
- When the encryption ends, the tag will be appended at the end of the file. The tag is of a know size, so this makes it easy to later recover it.
Decryption
- A custom CipherInputStream will be used. This stream knows how to read the header.
- It will then read the authentication tag, and will digest the whole file (without encrypting it) to validate it has not been tampered (I haven't actually measure how this will perform, however it's the only way I can think of to safely start decryption wihtout the risk of knowing too late the file should not have been decrypted in the first place).
- If the validation of the tag is ok, then the header will provide all the information needed to initialize the cipher and make the input stream decrypt the file. Otherwise it will fail.
Is this something that seems ok to you in order to handle encryption/decryption of big files?