6

Setup

Given the following:

$s = Crypt::encryptString('a');

Is is possible to know, for a string of length 1, the possible range of lengths of $s?

Context

Database storage - need to store an encrypted value, and would like to set validation of the input string so the longest length input string, when encrypted, is inserted into the db without truncation.

Basic tests

Running some very crude tests locally, using the following snippet:

Route::get('/test', function() {
    echo '<table>';
    for ($i=0; $i < 100; $i++) { 
        $s = str_repeat('a', $i);
        $l1 = strlen($s);
        $l2 = strlen(Crypt::encryptString($s));
        echo "<tr><td>$l1</td><td>$l2</td></tr>";
    }
    echo '</table>';
});

I can see the following, but it varies between runs, for example, a string of 'a' will be of length of either 188 or 192 (longer values seem to be between 244 and 248).

So there must be a formula. I have seen output_size = input_size + (16 - (input_size % 16)) but doesn't account for the variance.

Output

0   192
1   188
2   188
3   192
4   188
5   188
6   188
7   192
8   192
9   188
10  188
11  192
12  192
13  192
14  192
15  192
16  220
17  220
18  216
19  216
20  220

Edit

Ok, so after chatting with @Luke Joshua Park below, the variance in length comes from the laravel encryption function and the way $iv is created, which is random bytes, which can contain /.

$value inside the encryption method can also contain a /.

When values that contain a / are JSON encoded, the / is escaped to \\\/ adding an additional 3 characters per occurrence.

The real problem - can $iv and $value contain more than a single '/'?

Ing. Luca Stucchi
  • 3,070
  • 6
  • 36
  • 58
Chris
  • 54,599
  • 30
  • 149
  • 186
  • [Laravel `Crypt`](https://laravel.com/docs/5.7/encryption) uses AES-256-CBC. If you're looking to store that in a DB, [this question already covers that](https://stackoverflow.com/questions/33701936/how-to-store-aes-encrypted-information-in-mysql-database) – Machavity Feb 20 '19 at 15:29
  • You also have to consider that UTF-8 characters can have up to four bytes. – Jonas Staudenmeir Feb 20 '19 at 22:04
  • What database are you using? MySQL? – Jonas Staudenmeir Feb 21 '19 at 04:12
  • Yep, but it shouldn’t matter - in the sense I want to be able to calculate the size for storage regardless of the storage type – Chris Feb 21 '19 at 10:52
  • @patricus This isn't a duplicate. It's similar but not a duplicate. There is a lot of valuable information in the presented answers (arguably more than the "original" question you marked) so by closing this one off future users are missing out on a lot of really useful info. Although the context for this is db storage its about encryption – Chris Mar 04 '20 at 02:30
  • @Chris The other question was asked 2 years prior, and asks for the length to store in the database, which requires calculating the max length of the result. That is the same as this question. There's no loss of information here, as this Q/A is still available for anyone to see (it isn't deleted). There is only an expansion of information, as now viewers of each question can easily find the other question. – patricus Mar 04 '20 at 21:33
  • @Chris Just an FYI, [PR #31721](https://github.com/laravel/framework/pull/31721) was merged into 7.0.8, so as of 7.0.8, the forward slashes will no longer be escaped in the json, and the encrypted size should be deterministic. – patricus Mar 13 '20 at 15:07

3 Answers3

3

Looking through the source code for Crypt::encryptString, we can see that the final result will be a base64 encoded JSON object that has the following structure:

{ "iv": "<128 bits in base64>", "value": "<x bits in base64>", "mac": "<256 bits in hex>" }

Where the value of x is ceil(n / 128) * 128 where n is the number of bits in the original plaintext.

This means that, for an input plaintext of length 1, the size of the output should be:

  • 24 characters for the IV (base64).
  • 24 characters for the ciphertext (base64).
  • 64 characters for the SHA256 mac (hex).
  • 10 characters for the names of the JSON fields.
  • 19 characters of extra JSON characters e.g. {, ", :.
  • A final round of base64 encoding of the whole thing... (ceil(141 / 3) * 4)

Gives a total of 188. The fluctuations up to 192 are odd - your inputs are not changing in size at all (since the plaintext should always be 16 bytes between 0 - 15 length).

Luke Joshua Park
  • 9,527
  • 5
  • 27
  • 44
  • This is a great answer, but yeah the variances are quite odd. But running `echo strlen(Crypt::encryptString('a'))` multiple times will nearly always give one of two results (188 or 192). If I can understand what the maximum possible value is, that would be ideal – Chris Feb 18 '19 at 04:51
  • Is it possible the cipher is time based in some regard, that causes minor fluttering between values? – Chris Feb 18 '19 at 04:57
  • The only randomness involved is when the IV is generated, but it is generated at a fixed length of 128 bits. – Luke Joshua Park Feb 18 '19 at 05:02
  • Indeed, just digging a little deeper now - give me 5 mins to figure out the variance – Chris Feb 18 '19 at 05:08
  • 1
    Ah, damn - ok, I see... `$iv` is random bytes, which can contain `/`. `$value` can also contain a `/`. When that is JSON encoded, the `/` is escaped to `\\\/` adding an additional 3 characters per occurrence. The real problem now, is can $iv and $value contain many many `/` ? – Chris Feb 18 '19 at 05:16
3

The real problem - can $iv and $value contain more than a single '/'?

Sure. Your worst case for the IV is the IV FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (hex), which has a Base64 value of /////////////////////w==.

21 forward slashes * extra 3 bytes each = 63 extra bytes.

For the HMAC-SHA-2-256, you could get 32 bytes of 0xFF (worst case), which is //////////////////////////////////////////8= in base64.

42 forward slashes => 126 extra bytes.

For the ciphertext, again, the entire output could be (but likely isn't) FF FF ... FF. All one letter inputs (no matter what encoding) are a single block of ciphertext, making the output be /////////////////////w== again (+63).

The generalized formula for the maximum seems to be

  • IV: 24 + 63 = 87
  • HMAC: 24 + 63 = 87
  • JSON Property Names: 10
  • JSON Structure: 19
  • Ciphertext: ceil(ceil((n+1) / 16) * 16 / 3) * 4 * 4 (I used n as bytes. padded ciphertext is ceil((n+1) / blocksize) * blocksize, base64 is 4 * ceil(data / 3), extra *4 is "everything is slashes")
  • Base64 it all again: 4 * ceil(sum / 3)

= 4 * ceil((4 * 4 * ceil(16 * ceil((n + 1) / 16) / 3) + 203) / 3)

For n=1 that produces 400 bytes. The actual maximum is (I think) 388, because the ciphertext formula is counting 24 slashes as the worst case when 21 is the worst case. So the true supremum needs to call the ciphertext something more complicated involving floor, ceiling, and subtraction.

bartonjs
  • 30,352
  • 2
  • 71
  • 111
  • The HMAC is returned as hexits, which does not have the `/` in it's alphabet (only `[0-9a-f]`, so there will never be escaped characters in that value. – patricus Mar 13 '20 at 15:09
  • Additionally, the escaped forward slashes have been fixed in 7.0.8, so they will no longer be escaped. – patricus Mar 13 '20 at 15:11
3

Note I'm going to award the bounty to @Luke Joshua Park as he got me closest to what ended up being the (closest thing to a) solution, which is to follow.

(Not a) solution

The answer is, there is no concrete answer, not without unknowns and variance. Across the three people looking at this at the time of writing (myself, Luke, and bartonjs) there was still some doubt to a 100% accurate solution.

The question was posed to figure out a reliable type and size to store encrypted data, ideally in a database independent fashion (I didn't want to specify a particular database, as I wanted to know and understand how to calculate a length regardless of the way it was persisted).

However, even strings of the smallest lengths turned out to be quite long in the worst case scenario (where a random $iv was created containing many slashes - unlikely or not, it was possible). Possible encrypted strings of n=1 possibly being 400 bytes long mean that a varchar will never be the right answer.

So... what should be done?

So, instead, it seems best, most consistent and most reliable to store encrypted data as a text field and not a varchar (in mysql land), regardless of the length of the original string. This is a disappointingly boring answer with no fancy maths involved. It's not the answer I would like to accept, but makes the most sense.

But, what about passwords?

In a brief moment of stupidity, I thought, but what about the password field? That is a varchar. But of course that is a hashed value, not an encrypted value (I hadn't had enough coffee when that thought popped into my head, ok?)

Chris
  • 54,599
  • 30
  • 149
  • 186
  • Thanks for the bounty @Chris. On a side note, it isn't massively difficult to implement secure encryption with easily predictable length in PHP yourself. I stuck to the scope of the Laravel crypto package because you specified it in the question. I assume you don't want to use anything else? – Luke Joshua Park Feb 24 '19 at 19:42
  • Depending on the above, you might be interested in [this code example](https://github.com/luke-park/SecureCompatibleEncryptionExamples/blob/master/PHP/SCEE.php) from an example code repository that I maintain. Both the `encrypt` and `encryptString` examples have length that is easy to calculate and won't vary due to encoding issues! – Luke Joshua Park Feb 24 '19 at 19:50
  • 1
    Wow, fantastic lib @LukeJoshuaPark ! For this particular project they want to stick with the built in algo's, but for the next project I will recommend this. – Chris Feb 24 '19 at 21:28