2

When I execute the following code, it returns non-standard characters, so how do we remove it or get original string?

header('Content-type: text/html; charset=utf-8');
$String = "�่อตั้งเมื่อ";
echo $String;

Output : �?่อตั้งเมื่อ

Need actual result : ก่อตั้งเมื่อ

Rajendra Yadav
  • 645
  • 3
  • 12
  • 3
    How did you get/obtain the string `à¸?่อตั้งเมื่อ`??? – Marcx Jun 05 '15 at 06:35
  • 1
    Hint: That question mark near the start is wrong and it breaks the utf8 encoding. – Phil Jun 05 '15 at 06:56
  • Can you please show the image of your actual result? Because at least in my system there's no fonts which support those glyphs and thus what you wrote in "Output" is not really different from what you wrote in "Need actual result" - squares with code points in both lines. – hijarian Jun 05 '15 at 07:16
  • @hijarian Works for me on windows and android. Apparently they are thai characters. – Phil Jun 05 '15 at 08:04

3 Answers3

2

Your string, �่อตั้งเมื่อ is not valid utf-8. That is why the shows up - the browser does not know how to interpret it.

As others have indicated, the question mark on the third position likely is the problem.

The first three bytes of the erroneous string are e0 b8 3f (3f being the ascii code ?). I do not know any Thai, but the byte sequence for a THAI CHARACTER KO KAI looks pretty similar and should bee0 b8 81.

Anders Lindahl
  • 41,582
  • 9
  • 89
  • 93
  • 1
    That would also explain why it showed up as a question mark. 0x81 is undefined in latin1 and in windows-1252. Perhaps the OP tried to copy and paste the encoded string. BTW good dedication on finding the character. – Phil Jun 05 '15 at 08:01
0

You mentioned character encdoing as utf-8, where as the string is not encoded as utf-8. That's the reason that "?" mark is appearing in the output instead of the intended one.

0

First of all, to not be confused by encodings problem, you really want to read the following article: http://kunststube.net/encoding/

Second, I just have done the following:

$ vim ~/sandbox/php/encoding.php
( inserted your code verbatim )
$ cd ~/sandbox/php/
$ php -S localhost:1200

After I opened in Firefox the page http://localhost:1200, I got the contents of $String as they are.

I mean that I got the following line of chars:

�่อตั้งเมื่อ

This means your browser, whichever it is, does not know how to render the characters you are entering to it. The string itself is being encoded in UTF-8 correctly. You have to set your browser to show the text as UTF-8, or probably install the fonts which support those symbols.

Also, if you want to output, say some text with UTF-8 Devanagari symbols, you just need to satisfy the following requirements in PHP:

  1. Your source code file must be saved in UTF-8.
  2. You must send the utf-8 charset subheader, which you already do.
  3. You must put the string you want to output to the browser verbatim to the source code, no need to encode it in any way, PHP does not care.
hijarian
  • 2,159
  • 1
  • 28
  • 34