0

Preamble:

I found out that Javascript and PHP has different approach to UTF-8 multibyte character codes: - PHP treats multibyte char as several separated bytes; JS treats multibyte char as a single integer (larger than 255) - PHP keeps all auxiliary bits in the codes; JS strips all those bits.

So code of Russian letter 'А' will be

  • 208 and 144 in PHP

    1040 in JS

Problem description

I need to expose a string to some encoding routine in JS in a client's browser and than decode one in PHP on a server side. To encode and decode the strings I used the JS string property charCodeAt and PHP function chr(). As I mentioned above this approach is not working as the codes are different in PHP and JS.

Question

Is there any function in PHP to strip auxiliary bits from UTF-8 byte sequence OR is there any function in Javascript to add those auxiliary bits to char codes?

Community
  • 1
  • 1
Roman Matveev
  • 563
  • 1
  • 6
  • 22
  • possible duplicate of [PHP function ord() returns wrong code of cirilyc charecter](http://stackoverflow.com/questions/22575085/php-function-ord-returns-wrong-code-of-cirilyc-charecter) – Adrian Preuss Mar 22 '14 at 10:40
  • @AdrianPreuss it is not duplicate! It is an extension of my previous question. Please read a bit deeper to my new question. – Roman Matveev Mar 22 '14 at 10:41

1 Answers1

0

use mb_* functions otherwise split the bytes. Each byte with ord

<?php
    $the_char   = 'А';
    $byte_1     = $the_char[0];
    $byte_2     = $the_char[1];
    print (ord($byte_1) - 192) * 64 + (ord($byte_2) - 128);
?>

The same you can make with Javascript

Adrian Preuss
  • 3,228
  • 1
  • 23
  • 43
  • I see that you are literally casting those UTF-8 auxiliary bits manually. So there is no better way like library or intrinsic function? – Roman Matveev Mar 22 '14 at 10:43
  • 1
    Yeah, i've found a little Git-Project. I think that can wake up your interests: https://github.com/fluxbb/utf8/blob/master/functions/ord.php but its make the same as my snippet – Adrian Preuss Mar 22 '14 at 10:44