1

I set my header as follows:

header( 'Content-Type: text/html; charset="utf-8"' );

and then output a local file on my server to the browser using the following code-segment:

$content = file_get_contents($sPath);
$content = mb_convert_encoding($content, 'UTF-8');
echo $content;

The files I have on the server are created by lua and thus, the output of the following is FALSE (before conversion):

var_dump( mb_detect_encoding($content) );

The files contain some characters like (™) etc. and these appear as plain square boxes in browsers. I've read the following threads which were suggested as similar questions and none of the variations in my code helped:

There seem to be no problems when I simply use the following:

header( 'Content-Type: text/html; charset="iso-8859-1"' );
// setting path here
$content = file_get_contents($sPath);
echo $content;
Community
  • 1
  • 1
hjpotter92
  • 78,589
  • 36
  • 144
  • 183

2 Answers2

2

There seem to be no problems when I simply use the following:

header( 'Content-Type: text/html; charset="iso-8859-1"' );
// setting path here
$content = file_get_contents($sPath);
echo $content;

So this means the file content is actually encoded in ISO-8859-1. If you want to output this as UTF-8, then explicitly convert from ISO-8859-1 to UTF-8:

$content = mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');

You always need to know what you're converting from. Just telling PHP to "convert to UTF-8" and leaving it guessing what to convert from has an undefined outcome, and in your case it does not work.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • But `mb_detect_encoding($content)` results in false. – hjpotter92 Jan 15 '14 at 10:00
  • `mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');` didn't work either. – hjpotter92 Jan 15 '14 at 10:03
  • Don't ever count on anything `mb_detect_encoding` says, detecting encodings is fundamentally impossible with any degree of accuracy. What you need to know first and foremost is what encoding the file is actually in. You're not going to get anywhere without knowing that. – deceze Jan 15 '14 at 10:09
  • `file --mime myfile.txt` gave: **text/plain; charset=unknown-8bit**. As I said, the files are being created with lua. – hjpotter92 Jan 15 '14 at 10:12
  • Even lua will have to write the file in *some* known, specified encoding. Figure out what that is. Maybe you can get it to directly write UTF-8. – deceze Jan 15 '14 at 10:24
0

Check the file encoding, is it utf-8 without BOM? For example, use the notepad++ for check file encoding.

Or mayby it's usefull:

$content = file_get_contents($sPath);
$content = htmlentities($content);
echo $content;

Or try in .htaccess:

AddDefaultCharset utf-8
AddCharset utf-8 *
<IfModule mod_charset.c>
    CharsetSourceEnc utf-8
    CharsetDefault utf-8
</IfModule>
Victor Bocharsky
  • 11,930
  • 13
  • 58
  • 91