4

I tried to XOR two strings in PHP and in JS and I got different results:

PHP function

function xh($a, $b) {
  $res = ""; $i = strlen($a); $j = strlen($b);
  while($i-->0 && $j-->0) {
    $res.= $a[$i] ^ $b[$j];
  }
  return base64_encode($res);
}

JS function

function xh(a, b) {
  var res = "", i = a.length, j = b.length;
  while (i-->0 && j-->0) {
    res+= String.fromCharCode(a.charCodeAt(i) ^ b.charCodeAt(j));
  }
  return btoa(res);
}

I examined the bytes and found out that the sixth byte in PHP function is always zero, so I updated JS function to

JS function equivalent to PHP

function xh2(a, b) {
  var res = "", i = a.length, j = b.length;
  while (i-->0 && j-->0) {
    res+= String.fromCharCode((a.charCodeAt(i) ^ b.charCodeAt(j)) & 95);
  }
  return btoa(res);
}

So what is happening to that bit?

Example input/output:

string a: 5D41402ABC4B2A76B9719D911017C592
string b: FE2D010308A6B3799A3D9C728EE74244
PHP says: Bg0HVwBUVQkDDgcAVQRYWw8AUlBUVVtSUgIBBFUGAVM=
 JS says: Bg0HdwB0dQkDDgcAdQR4ew8AcnB0dXtycgIBBHUGAXM=
JS2 says: Bg0HVwBUVQkDDgcAVQRYWw8AUlBUVVtSUgIBBFUGAVM=

First difference in this example:

C: 0x43  = 0100 0011
4: 0x34  = 0011 0100
C^4 (JS) = 0111 0111 = 0x77 (correct)
C^4 (PHP)= 0101 0111 = 0x57
             ^
             sixth bit wrong

The inputs are MD5 hashes, I use default encoding, my OEM charset is CP1250, locale cs-cz, the files are stored in UTF-8 encoding and the page is generated with HTTP header text/html;charset=UTF-8 and with meta tag UTF-8 if any of these matters.

My web server is Mongoose 6.7 with php 5.6 (cgi) bundled. I also tried the latest 7.3 (x86 and x64) with the same results, however @apokryfos in the comments tested it with the sixth bit correct.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • 3
    `-->` what is this black magic? – Dominic Oct 15 '19 at 08:15
  • 2
    @Dominic `(i--)>0` – Jan Turoň Oct 15 '19 at 08:16
  • 2
    It means `(i--)> 0` – LF00 Oct 15 '19 at 08:16
  • Are the JS and PHP scripts both working on the same character encoding? As you're using the PHP strlen function I can deduce that either you're working with ASCII, or you're not aware that strlen isn't appropriate for multibyte encodings. – GordonM Oct 15 '19 at 08:47
  • @GordonM the input strings are MD5 hashes, every character should be single byte. – Jan Turoň Oct 15 '19 at 08:51
  • @04FS The difference appears in the XOR operation (it is base64 encoded just for better readability and transfers) – Jan Turoň Oct 15 '19 at 08:52
  • @JanTuroň If Javascript strings are something like UTF8 by default then that would be true, but if they're UTF-16... – GordonM Oct 15 '19 at 08:53
  • @GordonM I added some additional info about my system – Jan Turoň Oct 15 '19 at 08:59
  • There must be something wrong with your PHP. [This sandbox](http://sandbox.onlinephpfunctions.com/code/00e8fcf67757461651d73dc65f63c02a50c3402f) returns `Bg0HdwB0dQkDDgcAdQR4ew8AcnB0dXtycgIBBHUGAXM=` in PHP – apokryfos Oct 15 '19 at 10:13
  • @apokryfos interesting. I downloaded latest PHP 7.3 version, x86 and x64, thread safe or not, still the same difference. It must be the mongoose web server then. – Jan Turoň Oct 15 '19 at 10:36
  • 1
    Try passing the strings as literals first without getting mongoose involved at all, see if that helps. – apokryfos Oct 15 '19 at 10:41
  • @apokryfos thanks for the tip with literals - I found out that the problem was on my client-side library which created the hashes uppercase, see the explanation in my answer. – Jan Turoň Oct 15 '19 at 11:20

2 Answers2

1

The root of the problem is case-sensitivity: seems like some buggy implementations of MD5 doesn't lower the case of md5 output. Two different libraries were used on the client side and on the server side.

'A' starts at 0x41 = 0100 0001
'a' starts at 0x61 = 0110 0001
                       ^
                       here is the sixth bit
Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
0

For JS use a buffer or typed array instead of a string. Otherwise you need some binary safe string encoding.

You can XOR two strings in PHP in their entirety: $a ^ $b (don't forget length checking).

See: https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary

I get Bg0HdwB0dQkDDgcAdQR4ew8AcnB0dXtycgIBBHUGAXM= from PHP with your code so something else is going on.

Can you provide the PHP version and build / source?

jgmjgm
  • 4,240
  • 1
  • 25
  • 18
  • It makes no difference - see the updated example in the question. – Jan Turoň Oct 15 '19 at 09:34
  • I'm certain that PHP doesn't work with XOR on strings the way you're doing it. `chr(ord($a[$i]) ^ ord($b[$j]));` should produce something closer to JS. JS will also differ though as PHP strings are byte arrays but JS strings are unicode char arrays. You simply cannot use JS strings for binary operations safely the same way you can in PHP. You can however make PHP do the same as JS using mbstring or similar. You can make the JS match PHP using a byte array. – jgmjgm Oct 15 '19 at 09:44
  • Saying that, looks like xor does work on string bytes in PHP, which makes me wonder if you can xor the whole string instead of the loop? – jgmjgm Oct 15 '19 at 09:46
  • It may need & 255? – jgmjgm Oct 15 '19 at 10:03
  • See the latest edit. The problem is weirder than it looks. – jgmjgm Oct 15 '19 at 11:06
  • Thanks for the effort, the root of the problem was elsewhere, see my answer. My bad, sorry - I was using font with very similar upper and lower case. – Jan Turoň Oct 15 '19 at 11:18