79

I would like to know the size of data after AES encryption so that I can avoid buffering my post-AES data(on disk or memory) mainly for knowing the size.

I use 128 bit AES and javax.crypto.Cipher and javax.crypto.CipherInputStream for encryption.

A few tests performed with various input sizes show that, the post encryption size calculated as below is correct:

long size = input_Size_In_Bytes; 
long post_AES_Size = size + (16 - (size % 16));

But I am not sure whether the above formula is applicable for all possible input sizes.

Is there a way to calculate the size of data after applying AES encryption – in advance without having to buffer the encrypted data(on disk or memory) to know its post-encryption size?

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
Ramson Tutte
  • 843
  • 1
  • 7
  • 8

9 Answers9

111

AES has a fixed block size of 16 bytes regardless of key size. Assuming you use PKCS 5/7 padding, use this formula:

 cipherLen = clearLen + 16 - (clearLen mod 16)

Please note that if the clear-text is a multiple of the block size then a whole new block is needed for padding. For example, if your clear-text is 16 bytes then the cipher-text will take 32 bytes.

You might want to store the IV (Initial Vector) with the cipher-text. In that case, you need to add 16 more bytes for the IV.

Steven Liekens
  • 13,266
  • 8
  • 59
  • 85
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
  • 9
    why is it necessary to add a whole new block for a plaintext of 16 bytes. Its already a multiple of 16. thanks in advance – Durin Apr 21 '11 at 16:52
  • 7
    Without a least one bit of padding, the block has no endpoint. – Shane Chin Aug 08 '11 at 18:22
  • 2
    @sleske Padding for security reasons is required for e.g. RSA, but not for block cipher modes of operation. Usually padding makes a protocol *less secure* because of padding oracle attacks. It's adviced to use an authenticated mode of operation like GCM that 1) does not require padding and 2) provides integrity and authenticity to the ciphertext. – Maarten Bodewes Dec 22 '14 at 18:44
  • 9
    The question of @Durin is a good one. The reason is that there is no method of distinguishing a plaintext of e.g. `10 0F 0E ... 02 01` from a padded plaintext of `10 0F 0E ... 02` + padding `01`. That's why padding is always performed. Sometimes zero padding up to the block size is used if the plaintext size is known "out of band", or if it is known that the value doesn't contain `00` at the end (e.g. for ASCII strings). – Maarten Bodewes Dec 22 '14 at 18:48
  • @MaartenBodewes I read here http://www.di-mgt.com.au/cryptopad.html and as far as I understood, padding in cases that the original plaintext was already an exact multiple of cipher block length, is dependent to the padding mode. i.e for example, for _Pad with zeroes_ method, we always pad the input, but in _pad with spaces_ mode, we pad in cases that the input size is not multiple of cipher block length. Am I right? – Ebrahim Ghasemi May 13 '15 at 05:36
  • 4
    Padding with zero's and padding with spaces are not standardized modes. Bouncy always pads, even with zero's. PHP does not. Neither padding with zeros nor padding with spaces are *deterministic* padding modes. The reason they work at all is that either the plaintext (as bytes) has a known length or that the plaintext has a predetermined format (e.g. just printable ASCII characters). Zero padding may however fail spectacularly if these conditions are not met; e.g. if UTF16LE text ends with `00` (which is likely). In other words, these padding modes exist but they do put constraints on the input – Maarten Bodewes May 13 '15 at 12:12
  • 1
    @MaartenBodewes Thank you for your comment dear Maarten. May I ask you to explain me why the _ZZ-Coder's_ answer is correct please? I expect _cipherLen_ to be multiple of cipher's block length always. i.e I expect output of AES to be 16/32/ ... bytes always (with padding). But based on the _ZZ-Coder's_ answer, we may have post encryption sizes of any value. I see that the output of above formula is enough for AES output, but I think it is not efficient. In the other word I think the formula that is written in the question is better than the formula that is written in the answer. Am I wrong? – Ebrahim Ghasemi May 18 '15 at 05:38
  • 3
    @Abraham No, that's because the answer is indeed wrong. It's just a quick way of calculating an upper limit. For Java of course you can just question your `Cipher` instance for the correct length (nowadays). On stackoverflow upvotes don't count for hardly anything. – Maarten Bodewes May 18 '15 at 07:46
36

AES, as a block cipher, does not change the size. The input size is always the output size.

But AES, being a block cipher, requires the input to be multiple of block size (16 bytes). For this, padding schemes are used like the popular PKCS5. So the answer is that the size of your encrypted data depends on the padding scheme used. But at the same time all known padding schemes will round up to the next module 16 size (size AES has a 16 bytes block size).

Community
  • 1
  • 1
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 2
    There are padding schemes which do not require changing the data size. – usr Sep 13 '12 at 13:37
  • 6
    @usr No, there are *modes of operation* that do not require changing the data size (although usually an IV and/or authentication tag are required as overhead). Padding modes by definition make the data input larger for the cipher. – Maarten Bodewes Dec 22 '14 at 18:40
10

It depends on the mode in which you use AES. What you have is accurate for most of the block oriented modes, such as ECB and CBC. OTOH, in CFB mode (for one example) you're basically just using AES to produce a stream of bytes, which you XOR with bytes of the input. In this case, the size of the output can remain the size of the input rather than being rounded up to the next block size as you've given above.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
5

Generally speaking, for a block cipher encryption:

CipherText = PlainText + Block - (PlainText MOD Block)

ciphertext size is computed as the size of the plaintext extended to the next block. If padding is used and the size of the plaintext is an exact multiple of the block size, one extra block containing padding information will be added.

AES uses block size of 16 bytes, which produces:

CipherText = PlainText + 16 - (PlainText MOD 16)

Source: http://www.obviex.com/articles/CiphertextSize.pdf

Note:

  1. CipherText and PlainText represent the size of the cipher text and the size of the plain text accordingly.
Zaf
  • 91
  • 1
  • 5
4

The AES cipher always works on 16-byte (128-bit) blocks. If the number of input bytes is not an exact multiple of 16, it is padded. That's why 16 appears to be the "magic number" in your calculation. What you have should work for all input sizes.

In silico
  • 51,091
  • 10
  • 150
  • 143
  • Note that at least one padding byte is always added, even when the input length is an exact multiple of 16. – Jeff G Nov 30 '16 at 17:32
1

AES works in 128-bit (16 bytes) blocks and converts cleartext blocks into cyphertext blocks of the same length. It pads the last block if it is shorter than 16 bytes, so your formula looks correct.

wRAR
  • 25,009
  • 4
  • 84
  • 97
0

If your input length is smaller than max size of int you could use Cipher.getOutputSize(int)

ed22
  • 1,127
  • 2
  • 14
  • 30
0
long post_AES_Size = size + (16 - (size % 16));

cipherLen = (clearLen/16 + 1) * 16

what @zz-coder and @OP mentioned are same.

int(clearLen/16) + 1) * 16
= ((clearLen - clearLen % 16) / 16 + 1) * 16
= clearLen - clearLen % 16 + 16;
= clearLen + (16  - clearLen % 16)
-1

There are approaches to storing encrypted information which avoid the need for any padding provided the data size is at least equal to the block size. One slight difficulty is that if the data size is allowed to be smaller than the block size, and if it must be possible to reconstruct the precise size of the data, even for small blocks, the output must be at least one bit larger than the input, [i]regardless[/i] of the data size.

To understand the problem, realize that there are 256^N possible files that are N bytes long, and the number of possible files that are no longer than N bytes long is 256^N plus the number of possible files that are no longer than N-1 bytes long (there is one possible file that's zero bytes long, and 257 possible files that are no longer than one byte long).

If the block size is 16 bytes, there will be 256^16 + 256^14 + 256^13 etc. possible input files that are no more than 16 bytes long, but only 256^16 possible output files that are no more than 16 bytes long (since output files can't be shorter than 16 bytes). So at least some possible 16-byte input files must grow. Suppose they would become 17 bytes. There are 256^17 possible seventeen-byte output files; if any of those are used to handle inputs of 16 bytes or less, there won't be enough available to handle all possible 17-byte input files. No matter how big the input can get, some files of that size or larger must grow.

supercat
  • 77,689
  • 9
  • 166
  • 211