3

according to ISO 8859-1

€ Symbol has decimal value 128

My default php script encoding is

echo mb_internal_encoding(); //ISO-8859-1

So now as PHP

echo chr(128);  //Output exactly what i want '€'

But

echo ord('€');  //opposite it returns 226, it should be 128

why it is so?

Dhairya Lakhera
  • 4,445
  • 3
  • 35
  • 62
  • Did you read the manual? It's kinda explained in there... [ord()](http://php.net/manual/en/function.ord.php) – Naruto Feb 23 '16 at 11:05
  • 1
    yes i have read that first ord() function complements chr(). – Dhairya Lakhera Feb 23 '16 at 11:09
  • 1
    Have you also read the comments on the page @Naruto has linked to? Especially the second one? It explains in detail why `ord()` doesn't work with utf-8: `For single-byte encodings such as (7-bit) ASCII and the ISO 8859 family, this will correspond to the first character, and will be the position of that character in the encoding's mapping table. For multi-byte encodings, such as UTF-8 or UTF-16, the byte may not represent a complete character` – Michel Feb 23 '16 at 11:19
  • my default encoding is **echo mb_internal_encoding(); //ISO-8859-1** – Dhairya Lakhera Feb 23 '16 at 11:19
  • But chr() also support single-byte encodings. why it output '€ ' for 128 decimal value – Dhairya Lakhera Feb 23 '16 at 11:23
  • 2018! see new [**mb_ord()**](http://php.net/manual/en/function.mb-ord.php), for **PHP v7.2.0+** – Peter Krauss Feb 01 '18 at 20:01

4 Answers4

5

It is only for 2018's PHP v7.2.0+.

mb_ord()

Now you can use mb_ord(). Example echo mb_ord('€','UTF-8');

See also mb_chr(), to get the UTF-8 representation of a decimal code.
Example echo mb_chr(2048,'UTF-8');.


The best practice is to be universal, save all your PHP scripts as UTF-8 (see @deceze).

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
4

According to Wikipedia and FileFormat,

  • ISO-8859-1 doesn't have the Euro symbol at all
  • ISO-8859-15 has it at codepoint 164 (0xA4)
  • Windows-1252 has it at codepoint 128 (0x80)
  • Unicode has the Euro symbol at codepoint 8364 (0x20AC)
  • UTF-8 encodes that as 0xE2 0x82 0xAC. The first byte E2 is 226 in decimal.

So it seems your source file is encoded in UTF-8 (and ord() only returns the first byte), whereas your output is in Windows-1252.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
2
echo ord('€');  //opposite it returns 226, it should be 128

Your .php file is saved as UTF-8 (you saved it as UTF-8 in your text editor when you saved the file to disk). The string literal in there contains the bytes E2 82 AC; visualised it's something like this:

echo ord('\xE2\x82\xAC');

Open the file in a hex editor for real clarity.

ord only returns a single integer in the range of 0 - 255. Your string literal contains three bytes, for which ord would need to return three integers, which it won't. It returns only the first one, which is 226.

Save the file in different encodings in your text editor and you'll see different results.

deceze
  • 510,633
  • 85
  • 743
  • 889
1

This PHP function return the decimal number of the first character in string.

  • If the number is lower than 128 then the character is encoded in 1 octet.
  • Elseif the number is lower than 2048 then the character is encoded in 2 octets.
  • Elseif the number is lower than 65536 then the character is encoded in 3 octets.
  • Elseif the number is lower than 1114112 then the character is encoded in 4 octets.

function ord_utf8($s){
return (int) ($s=unpack('C*',$s[0].$s[1].$s[2].$s[3]))&&$s[1]<(1<<7)?$s[1]:
($s[1]>239&&$s[2]>127&&$s[3]>127&&$s[4]>127?(7&$s[1])<<18|(63&$s[2])<<12|(63&$s[3])<<6|63&$s[4]:
($s[1]>223&&$s[2]>127&&$s[3]>127?(15&$s[1])<<12|(63&$s[2])<<6|63&$s[3]:
($s[1]>193&&$s[2]>127?(31&$s[1])<<6|63&$s[2]:0)));
}

echo ord_utf8('€');

// Output 8364 then this character is encoded in 3 octets

You can check the result in https://eval.in/748181

The ord_utf8 function is the reciprocal of chr_utf8 (print one utf8 character from decimal number)

function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}

for($test=1;$test<1114111;$test++)
if (ord_utf8(chr_utf8($test))!==$test)
die('Error found');

echo 'No error';

// Output No error
Php'Regex
  • 213
  • 3
  • 4