1

I have a hashing method in C# that looks like:

MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();

byte[] raw_input  = Encoding.UTF32.GetBytes("hello");
byte[] raw_output = md5.ComputeHash(raw_input);

string output = "";
foreach (byte myByte in raw_output)
    output += myByte.ToString("X2");

return output;

How can I implement this in PHP? Doing the following produces a different hash digest...

$output = hash('md5', 'hello');
Jesse
  • 1,213
  • 1
  • 13
  • 16
  • 1
    Are you sure PHP is using UTF-32? That sounds very unlikely to me. – Jon Skeet Jul 24 '12 at 19:38
  • 1
    Please post the hashes you are getting for both – Leigh Jul 24 '12 at 19:39
  • 2
    Does thsi help? http://stackoverflow.com/questions/821817/php-md5-algorithm-that-gives-same-result-as-c-sharp?rq=1 – Nanne Jul 24 '12 at 19:39
  • The hashes for the string "admin": In C# = "1E3FCD02B1547F847CB7FC3ADD4484A5" and in PHP = "21232f297a57a5a743894a0e4a801fc3". How can I set PHP to use UTF-32? – Jesse Jul 24 '12 at 19:42

3 Answers3

5

You need to find out which encoding PHP is using to convert your string to text. It's very unlikely that it's using UTF-32. It may well be using the platform default encoding, or possibly UTF-8.

using (MD5 md5 = MD5.Create())
{
    byte[] input = Encoding.UTF8.GetBytes("hello");
    byte[] hash = md5.ComputeHash(input);
    return BitConverter.ToString(hash).Replace("-", "");
}

(This is the problem with languages/platforms which treat strings as binary data all over the place - it doesn't make it clear what's going on. There has to be a conversion to bytes here, as MD5 is defined for bytes, not Unicode characters. In the C# code you're doing it explicitly... in the PHP it's implicit and poorly documented.)

EDIT: If you've got to change the PHP, you could try this:

$text = mb_convert_encoding($text, "UTF-32LE");
$output = md5($text)

It depends whether PHP supports UTF-32 though...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • As I told WouterH, I cannot change the C# code. I need to be able to create identical hashes as the unchanged C# code via PHP. – Jesse Jul 24 '12 at 19:44
  • @Jesse: This is the sort of thing you need to include in a question in future. – Jon Skeet Jul 24 '12 at 19:45
  • @Jesse: See my edit. It *may* work, but I can't try it here. Worth a try. – Jon Skeet Jul 24 '12 at 19:47
  • @Jesse: Yes, but you didn't say that you couldn't change the C#. UTF-32 it a pretty odd choice of encoding. – Jon Skeet Jul 24 '12 at 19:47
  • Hmm, that doesn't seem to work either. It produces a different digest of course, but still not the same as the C# digest. This is very strange. – Jesse Jul 24 '12 at 19:50
  • @JonSkeet: Does C# add a BOM when you convert to UTF-32 like that? – Leigh Jul 24 '12 at 19:52
  • There's still the question of what `mb_convert_encoding` will convert *from*. The docs say it uses the "internal encoding", but that can be changed apparently, so it might (or might not) be something else entirely from what PHP string literals are parsed as. – millimoose Jul 24 '12 at 19:52
  • (My guess would be they're not parsed as anything, but they're the bytes present in the source file in whatever encoding it was saved as. Mostly because that's such a PHP thing to do.) – millimoose Jul 24 '12 at 19:53
  • The C# UTF32 is actually Little endian, so you need to convert to `"UTF32-LE"`. This is strange as I recall UTF32 having to default to Big Endian if the endianess isn't specified. – Esailija Jul 24 '12 at 19:54
5

PHP

This PHP code will do:

<?php
$str = "admin";
$strUtf32 = mb_convert_encoding($str, "UTF-32LE");
echo md5($strUtf32);
?>

This code outputs "1e3fcd02b1547f847cb7fc3add4484a5"

huysentruitw
  • 27,376
  • 9
  • 90
  • 133
  • I cannot change the C# code. I need to be able to create identical hashes as the unchanged C# code via PHP. – Jesse Jul 24 '12 at 19:43
  • @Leigh: I don't think a BOM is added here. It just defines the order of the 4 bytes for each character. – huysentruitw Jul 24 '12 at 19:56
  • @Leigh: Yeah, but Esailija was looking over my sholder ;) – huysentruitw Jul 24 '12 at 19:58
  • @WouterH that's not fair, you had the same solution only 30 seconds earlier than I. I am not familiar with c# at all so it was [a big wtf for me that it defaulted to LE](http://unicode.org/faq/utf_bom.html#gen6) :P – Esailija Jul 24 '12 at 19:59
  • @Esailija: that's because of x86 using LE. – huysentruitw Jul 24 '12 at 20:02
  • Yeah but the the link clearly says is that if endianess isn't specified, it's according to BOM. If there is no BOM, it's Big-Endian. In fact, jon skeet's answer fails because php correctly defaults to Big Endian. – Esailija Jul 24 '12 at 20:04
  • Point to note: the string you're encoding has to be in UTF-8, which is by no means something you can rely on. If they're string literals they'll be in whatever your text editor / IDE is set to; if they come from a HTML form, they'll be in whatever encoding you sent the page in; if they're read from a database, they'll be gods know what given the number of moving parts involved. – millimoose Jul 24 '12 at 20:14
1

When you apply md5 to Encoding.UTF32.GetBytes("admin");, that's same as

echo hash( "md5","a\0\0\0d\0\0\0m\0\0\0i\0\0\0n\0\0\0");
//1e3fcd02b1547f847cb7fc3add4484a5

In php.

You need to convert your string to UTF32-LE in PHP:

echo md5( mb_convert_encoding( "admin", "UTF-32LE" ) );
//1e3fcd02b1547f847cb7fc3add4484a5
Esailija
  • 138,174
  • 23
  • 272
  • 326