1

I'm trying to encrypt and decrypt a file in chunks in C# with AES encryption.

Currently, it reads a file in chunks of 1000 bytes, encrypts each chunk, and writes over the file with the format of {IV}{Chunk0}{Chunk1}{Chunk2}... (without any separation characters and no curly braces).

When decrypting the file, it reads the first 16 bytes from the file, uses it as the AES IV and decrypts the rest of the file in chunks.

My problem is, how can I know the size of each encrypted chunk to decrypt? When encrypting, each chunk consists of 1000 bytes of plain text, but when it's encrypted, this length of 1000 changes.

Should I use a separation character between each chunk such as a comma? Or is it possible for me to avoid using a separation character between the chunks and instead decode by reading chunks of x characters? (As stated above, is it possible for me to calculate the size of each encrypted chunk if each one is 1000 bytes of encrypted plain-text?)

Ari Seyhun
  • 11,506
  • 16
  • 62
  • 109
  • 4
    AES is a block cipher that processes blocks of 128bit=16bytes. 1000 is not dividable by 16 - hence it is a bad choice. – Robert Aug 17 '17 at 13:14
  • 1
    Just to clear it up, I'm talking about 1000 bytes of plaintext to be encrypted. The output is always divisible by 16. Each block is padded using `PKCS7`. I understand that it may be a bad idea to use 1000; instead I will use 1024. But my question remains. – Ari Seyhun Aug 17 '17 at 13:18
  • This sounds like something similar to disk encryption. You should read up on how experts do disk encryption, and in particular the block cipher modes they use to maintain security. – President James K. Polk Aug 17 '17 at 14:11
  • It is not clear what you are trying to accomplish so the following may not fit. More information would help. 1. Use a chunk size that is a multiple of the block size. 2. Do not reinitialize the encryption, just continue with the same instance. 3. Padding is only added to the final block when final is called. – zaph Aug 17 '17 at 14:56

1 Answers1

1

Simplest? When you encrypt each chunk, get the byte length, and then store it as well in your file:

{IV}{LengthOfEncryptedChunk}{EncryptedChunk}{IV}{LengthOfEncryptedChunk} ... etc
Kevin
  • 2,133
  • 1
  • 9
  • 21
  • This seems like a solution. FYI: I was using a single `IV` for each file, and each chunk is encrypted with the same `IV`. Is this ok for me to do? – Ari Seyhun Aug 17 '17 at 13:23
  • Honestly? You'd probably be okay, depending on the contents. IVs are meant to stop things like Known-Content Attacks and Tampering Attacks. As long as each file has a different IV, about the only thing I'd worry about is if those 1000 bytes were similar-ish to other 1000 bytes in the file. – Kevin Aug 17 '17 at 13:38
  • @Acidic: No, it's probably not ok to use the same IV for each chunk. How bad it is depends on the cipher mode. For counter modes it would likely be a disaster. – President James K. Polk Aug 17 '17 at 14:10
  • Note that the IV for the next block is the encrypted prior block. There is no need for a new IV for each chunk. Also no need for padding if each chunk is a multiple of the block size. Then there is no need for interspersed IVs and no need for such a format. – zaph Aug 17 '17 at 15:00
  • I'm pretty sure he's just encrypting each 1024 bytes in a vacuum - it's not subsequent encryption in a chain (because, honestly, what would the point of even storing byte lengths?) There's not really a cipher mode for chaining, because he's never encrypting more than one block. It's just one-block encryption, for every single block. That's why I don't think the IV needs to vary for each block within a file, because unless there's some way of predicting the contents of any given 1024 byte chunk, there's not really any way of doing a Known-Content attack. The IV just needs to vary by file. – Kevin Aug 17 '17 at 15:22
  • The answer shows an IV per chunk. The OP states: "Currently, it reads a file in chunks of 1000 bytes, encrypts each chunk, and writes over the file" which indicated the entire file is available at once. But confusingly it also states writing **over** the file which is confusing. Also stated in a comment is that the 100 byte chunks can be another size. Also stated: "is it possible to avoid using a separation character between the chunks and instead decode by reading chunks of x characters?" which tends to confirm the data is not defined by the chunk size. We need clarification of the use case. – zaph Aug 17 '17 at 16:18