PHP has a method hash_hmac that computes the HMAC signature of a given string using a given key and algorithm. But HMAC technically operates on binary data, and PHP takes all its params here as strings. How does it convert those strings to binary data?
-
4PHP strings can store anything. "Binary data" doesn't mean anything special. It is an expression used to denote any data that may contain non-printable characters (that PHP can handle without problems). – axiac May 12 '20 at 22:02
2 Answers
Short answer: String encoding is just metadata attached to a lump of binary data. PHP strings are just the lump, you have to keep track of the rest.
Long answer:
PHP takes the Honey Badger approach to native string encodings, in other words, "PHP don't care". You give it a sequence of bytes, it stores them. It has no concept of encoding until you want to use a function that cares about it. Even then you need to explicitly declare the input and output encodings, otherwise PHP will go with its configured default which is usually not what anyone actually wants.
function nice_hex($in) {
return implode(' ', str_split(bin2hex($in), 2));
}
$utf8 = "You owe me €5.";
$utf16le = mb_convert_encoding($utf8, 'utf-16le', 'utf-8');
$utf16be = mb_convert_encoding($utf8, 'utf-16be', 'utf-8');
$iso88591 = mb_convert_encoding($utf8, 'iso-8859-1', 'utf-8');
$cp1252 = mb_convert_encoding($utf8, 'cp1252', 'utf-8');
var_dump(
$utf8,
nice_hex($utf8),
hash_hmac('md5', $utf8, 'foo'),
$utf16le,
nice_hex($utf16le),
hash_hmac('md5', $utf16le, 'foo'),
$utf16be,
nice_hex($utf16be),
hash_hmac('md5', $utf16be, 'foo'),
$iso88591,
nice_hex($iso88591),
hash_hmac('md5', $iso88591, 'foo'),
$cp1252,
nice_hex($cp1252),
hash_hmac('md5', $cp1252, 'foo')
);
Output:
string(16) "You owe me €5."
string(47) "59 6f 75 20 6f 77 65 20 6d 65 20 e2 82 ac 35 2e"
string(32) "7724135d91c43906f8730a26dcd76ffb"
string(28) "You owe me � 5."
string(83) "59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 00 ac 20 35 00 2e 00"
string(32) "f4a2347b4a1336dae1db21554c54b9e2"
string(28) "You owe me �5."
string(83) "00 59 00 6f 00 75 00 20 00 6f 00 77 00 65 00 20 00 6d 00 65 00 20 20 ac 00 35 00 2e"
string(32) "b0c1a98d8b853e6568bae513d764a029"
string(14) "You owe me ?5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 3f 35 2e"
string(32) "301a0fb55e23285904413323d10cc774"
string(14) "You owe me �5."
string(41) "59 6f 75 20 6f 77 65 20 6d 65 20 80 35 2e"
string(32) "fa1ee73d39e1a70fe2cde7a8c5bbf0ba"
And the reason why that all looks like it does is because:
- StackOverflow uses UTF-8.
- My editor uses UTF-8.
- My console uses UTF-8.
- The fact that PHP doesn't care about string encoding lets me produce arbitrarily-encoded trash output like the above quite easily.
Additional recommended reading: UTF-8 all the way through
Fun Fact: One of the reasons why PHP6 never ended up happening was because they wanted to include native multibyte string encoding but no one could agree on what flavor it should be. Eventually they just scrapped the whole thing and left it up to us the same as it was in PHP5.
-
Nice. So what determines the encoding used by `hash_hmac` to convert the passed string to bytes? If it's configuration, what configuration? – Tom Hamming May 13 '20 at 18:04
-
I get how encodings work - I just wasn't quite grasping the fine points here. This question is getting into details of strings in PHP, which is probably a rabbit trail. I might need to open another question at some point. Thanks! – Tom Hamming May 13 '20 at 20:09
It's just UTF-8 (for string literals).
You can put whatever encoding you want in a string, hash_hmac()
doesn't use any specific encoding, just whatever encoding your string has.
Here's an example from Wikipedia using UTF-8 encoding and running a HMAC algorithm over the binary:
HMAC_MD5("key", "The quick brown fox jumps over the lazy dog")
= 80070713463e7749b90c2dc24911e275
And here's the result of the equivalent PHP code, which gets the same response:
php > echo hash_hmac('md5', "The quick brown fox jumps over the lazy dog", "key");
80070713463e7749b90c2dc24911e275

- 6,828
- 12
- 36
- 50
-
1What makes PHP use UTF-8? Is it based on a configuration somewhere? Some intrinsic part of the string implementation? – Tom Hamming May 12 '20 at 22:46
-
String literals in PHP [are UTF-8](https://www.php.net/manual/en/xml.encoding.php). However, you can import strings in other encodings, or covert to other encodings. – Joundill May 12 '20 at 22:52
-
Wrong. PHP strings are 100% binary and contain the exact bytes you put into them with no encoding. Certain operations might _assume_ a given encoding based on the configured defaults, but if the data disagrees with the assumed encoding you'll get corrupted output. – Sammitch May 12 '20 at 23:06
-
1@Sammitch ok, so if you pass a literal string to `hash_hmac` as above, what are the "exact bytes"? Still UTF-8? What if the string you pass comes from the body of an HTTP request that was encoded in something else for some reason, like UTF-16? – Tom Hamming May 12 '20 at 23:16
-
@TomHamming They're right in as much as a string is just a string of bytes. You can stick whatever encoding you want in it. I should really update my answer to reflect that. – Joundill May 13 '20 at 22:44