2

When calling the md5 method in php and passing a string as argument, i would guess that php has to convert the string to bytes to perform the hash operation, what encoding does it use when converting from string to bytes?

Im trying to port the following php code into .net code.

.net can't Hash a string first it has to be converted to a byte array or stream.

<?php
$params = $_GET;
$var = "";
foreach ($params as $key => $value)
{
    if($key != "hash")
    {
        $var .= $value;
    }
}
$genstamp = md5($var . "SecretMD5Key");
if($genstamp != $_GET["hash"])
{
    echo "Hash is not valid";
    exit();
}
else
{
    //Hash is OK    
}
?>
Peter
  • 37,042
  • 39
  • 142
  • 198

2 Answers2

3

PHP strings are not "encoded", they are more like byte arrays. It is programmer's responsibility to make sure that code isn't doing something stupid (like concatenating a UTF-8 string and an ISO-8859 or using a unicode function an a non-unicode string). Generally it makes things hard, but at least you know exactly what md5 is going to encode: it depends entirely on the source of the string ( settings of a database driver, encoding of a page that hosted the form for $_REQUEST values etc.).

fdreger
  • 12,264
  • 1
  • 36
  • 42
  • They are exactly like byte arrays, for the purposes of `md5`. One issue is, the source encoding is often poorly specified. For instance, what's the encoding of a php string literal? (as far as I can tell, UTF8 but it doesn't seem obvious or guaranteed by the docs) – pvg Mar 09 '17 at 08:00
  • To answer my question what encoding does `$_GET` have? – Peter Mar 09 '17 at 08:01
  • @Peter whatever encoding the sender used – Gordon Mar 09 '17 at 08:19
  • @Gordon I'm not sure that's actually true as specified. – pvg Mar 09 '17 at 08:26
  • @pvg there is nothing in https://github.com/php/php-src/blob/master/main/php_variables.c indicating that any request input has it's encoding changed. – Gordon Mar 09 '17 at 08:44
  • @pvg Additional useful reading: http://stackoverflow.com/questions/27345626/what-is-the-character-set-if-default-charset-is-empty – Gordon Mar 09 '17 at 08:50
  • @Peter: Absolutely no encoding at all! it's just the string of bytes sent – fdreger Mar 09 '17 at 08:55
  • @Gordon right, i meant I don't know what the protocol behaviour is supposed to be rather than what PHP v.whatever does. Does a get param _have_ to be UTF8? Maybe someone with a better memory and grasp of RFC legalese remembers cause I don't. – pvg Mar 09 '17 at 09:02
  • 1
    @pvg GET parameters are sort of a blind spot in all the relevant standards. In practice, browsers will always use the encoding of the originating page (which is not really that logical... because generally it's the target server that needs to interpret them, not the originating one; but in case of GET params in web applications it works, because they control both sides). – fdreger Mar 09 '17 at 09:08
  • 1
    @pvg and there is no "encoding of a PHP literal" - it depends fully on the encoding used to write the PHP file. PHP stores strings as exactly the byte sequences used in source files. – fdreger Mar 09 '17 at 09:11
  • @fdreger echo "\u{1F602}"; however pedantically, I'm not sure that's strictly true either. – pvg Mar 09 '17 at 09:39
  • Gotta say this was a lot more complicated than i guessed, I'm still confused how PHP can just ignore the concept of encoding totally, but hey i got my answer I'm happy. – Peter Mar 09 '17 at 12:21
2

The md5 function is defined in

It uses this Public Domain implementation:

PHP's code to call the implementation is just this:

/* {{{ proto string md5(string str, [ bool raw_output])
   Calculate the md5 hash of a string */
PHP_NAMED_FUNCTION(php_if_md5)
{
    zend_string *arg;
    zend_bool raw_output = 0;
    char md5str[33];
    PHP_MD5_CTX context;
    unsigned char digest[16];

    ZEND_PARSE_PARAMETERS_START(1, 2)
        Z_PARAM_STR(arg)
        Z_PARAM_OPTIONAL
        Z_PARAM_BOOL(raw_output)
    ZEND_PARSE_PARAMETERS_END();

    md5str[0] = '\0';
    PHP_MD5Init(&context);
    PHP_MD5Update(&context, ZSTR_VAL(arg), ZSTR_LEN(arg));
    PHP_MD5Final(digest, &context);
    if (raw_output) {
        RETURN_STRINGL((char *) digest, 16);
    } else {
        make_digest_ex(md5str, digest, 16);
        RETVAL_STRING(md5str);
    }

}

As mentioned elsewhere on this page already, there will be no conversions. It will use whatever you put in from PHP to calculate the MD5 hash via these:

PHPAPI void PHP_MD5Init(PHP_MD5_CTX *ctx);
PHPAPI void PHP_MD5Update(PHP_MD5_CTX *ctx, const void *data, size_t size);
PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx);
Gordon
  • 312,688
  • 75
  • 539
  • 559