3

I have a unicode string received over HTTP Post or fetched from a DB (does not matter)

In PHP I checked the encoding of the string using "mb_detect_encoding" and got UTF-8 as the result.

SO therefore the string is in Unicode.

But how do I write the string from php to a output file with the proper encoding

    $fd = fopen('myfile.php', "wb");
    fwrite($fd, $msg."\n");

What I see is "टेसà¥à¤Ÿ" instead of the actual string which is टेस्ट्

Pasting the 'junk' into Notepad++ and then from menu option doing 'encoding UTF-8' will show the proper text.

EDIT *SOLUTION*

Sorry for posting the question and figuring out the answer myself.

I found the solution at the following site http://www.codingforums.com/showthread.php?t=129270

function writeUTF8File($filename,$content) {
        $f=fopen($filename,"w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC",0xef,0xbb,0xbf));
        fwrite($f,$content);
        fclose($f);
}
Anand
  • 4,182
  • 6
  • 42
  • 54

3 Answers3

2

PHP does not change the encoding of the string or does anything with it when you write to a file. It simply dumps the bytes of the string (PHP strings are really byte arrays) into the file, period. If you actually receive the string as UTF-8 and do not do anything with it except write it to a file, the content of the file will be UTF-8 encoded. Your problem is most likely that whatever application you're using to view the file does not properly read it as UTF-8 encoded.


That BOM solution is not necessarily the best. A BOM is not necessary for UTF-8 and many applications have problems with it. It only helps applications that are otherwise unable (too stupid) to detect that a file is UTF-8 encoded. The better solution may be to simply explicitly tell the application in question that it needs to treat the file as UTF-8 encoded when opening the file. Or use a better application.

deceze
  • 510,633
  • 85
  • 743
  • 889
0

You have to specify the strict parameter of mb_detect_encoding, or you'll get many false positives.

Also, while the output may be UTF-8, you will have to specify the right headers (content-encoding) and/or the charset meta tag (if it's HTML).

Community
  • 1
  • 1
GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • The output is a php file. how do i specify headers for a php file?? – Anand Mar 23 '12 at 08:03
  • 1
    Sorry, I misunderstood that. About the solution you posted yourself: note that when you start your PHP file with a byte order mark (BOM), this BOM will also be outputted when you include the PHP file. This may result in an unwanted BOM being outputted at the start of (or somewhere in the middle of) a page that is rendered using this generated PHP file. – GolezTrol Mar 23 '12 at 10:35
0

Sorry for posting the question and figuring out the answer myself.

I found the solution at the following site here

function writeUTF8File($filename,$content) {
        $f=fopen($filename,"w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC",0xef,0xbb,0xbf));
        fwrite($f,$content);
        fclose($f);
}
Anand
  • 4,182
  • 6
  • 42
  • 54
  • The “UTF-8 BOM” is bogus; there is no byte order to mark for UTF-8. Unfortunately many tools from Microsoft Land default to including the faux-BOM in UTF-8 output, and in some cases failing to read UTF-8 input otherwise. Your original file was already valid and correct UTF-8; if Notepad++ isn't defaulting to recognising files as UTF-8 you should probably change that setting. – bobince Mar 23 '12 at 15:30