1

I have a weird problem , the following code :

$str = "נסיון" // <--- Hebrew chars
echo mb_detect_encoding ($str)."<br><br><br>";
$str = iconv (mb_detect_encoding($str),'UCS-2BE',$str);
echo mb_detect_encoding ($str)."<br><br><br>";

This will output :

UTF-8

UTF-8

This code is written in a file that's encoded (using Notepad++) in UTF-8 Without BOM, trying other encodings and didn't work.

I also tried converting the string using :

$str = mb_convert_encoding($str,'UCS-2BE');

But that didn't work either. Any insights?

eric.itzhak
  • 15,752
  • 26
  • 89
  • 142
  • What is the problem? To detect hebrew? How about `preg_match('/[\u0591-\u05F4]/', $sData);` – Alma Do Aug 08 '13 at 13:03
  • similar issue http://stackoverflow.com/questions/17104340/mb-detect-encoding-doesnt-properly-working-with-windows-1250-cp1250 – giorgio79 Jan 05 '15 at 10:58

1 Answers1

1

From the documentation for mb_detect_order, the function that establishes the order in which mb_detect_encoding tests different encodings:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail. UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-*, mbstring always detects as ISO-8859-*.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

So, you can't detect the encoding of the second string with the mb functions.

Joni
  • 108,737
  • 14
  • 143
  • 193