mb_detect_encoding showing the same encoding

Question

I have a weird problem , the following code :

$str = "נסיון" // <--- Hebrew chars
echo mb_detect_encoding ($str)."<br><br><br>";
$str = iconv (mb_detect_encoding($str),'UCS-2BE',$str);
echo mb_detect_encoding ($str)."<br><br><br>";

This will output :

UTF-8

This code is written in a file that's encoded (using Notepad++) in UTF-8 Without BOM, trying other encodings and didn't work.

I also tried converting the string using :

$str = mb_convert_encoding($str,'UCS-2BE');

But that didn't work either. Any insights?

What is the problem? To detect hebrew? How about `preg_match('/[\u0591-\u05F4]/', $sData);` — Alma Do, Aug 08 '13 at 13:03
similar issue http://stackoverflow.com/questions/17104340/mb-detect-encoding-doesnt-properly-working-with-windows-1250-cp1250 — giorgio79, Jan 05 '15 at 10:58

Joni · Accepted Answer · 2013-08-08T13:10:23.110

From the documentation for mb_detect_order, the function that establishes the order in which mb_detect_encoding tests different encodings:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail. UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-*, mbstring always detects as ISO-8859-*.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

So, you can't detect the encoding of the second string with the mb functions.

mb_detect_encoding showing the same encoding

1 Answers1