0

SOLUTION:

    $output = '–– € ––'; 
//written like this php 5 does not understand because it interprets it as single-byte chars.
//so i found the function below to write a multi-byte char in a string.

//unicode version of php's chr()
function uchr ($codes) {
    if (is_scalar($codes)) $codes= func_get_args();
    $str= '';
    foreach ($codes as $code) $str.= html_entity_decode('&#'.$code.';',ENT_NOQUOTES,'UTF-8');
    return $str;
}

//decimal values of unicode chars: – 8211 - 8211, [space] 32, € 8364,[space] 32, – 8211 - 8211
$output = uchr(8211,8211,32,8364,32,8211,8211);

//or
$output = uchr(8211,8211).' '.uchr(8364).' '.uchr(8211,8211);

echo $output;

QUESTION:

How can i write these special chars to a simple file?

$file = "./upload/myfile.txt";
$output = "–– € ––".PHP_EOL; // the "–" is not an underscore _ or - but –
file_put_contents($file, $output);

If I access this file from the browser http://mydomain.com/upload/myfile.txt i only get "�" characters.

However if i save "–– € ––" with Zend Developer or my local texteditor (on OSX) and upload this everything is perfectly fine. The browser shows it correctly.

How can i achieve this with php? it seems php uses a different way of writing the file than my macbook. thought i thought php's standard was UTF-8 and i also saved the file as UTF-8 in my local text editor.

EXTRA INFO: in the .htaccess file that's in the upload folder i wrote:

  AddDefaultCharset utf-8
  AddCharset utf-8 .txt

otherwise the firebug addon from firefox gave a message that the charset was not specified.

any ideas? It has to do with saving the file because my uploaded file shows correctly.

i tried different options while saving the file like:

$output = mb_convert_encoding($output, 'UTF-8', 'OLD-ENCODING');

and the iconv function of php, but i cant find the solution.

any help is greatly appreciated.

EDIT: if i get the content from my uploaded file and echo it the following happens

$output = file_get_contents('./upload/myuploadedfile.txt',FILE_USE_INCLUDE_PATH); 
//it show correctly –– € ––
$output = $output[1]; //it shows a �
$output = $output[3]; //it shows a �

echo $output;
fellowworldcitizen
  • 3,441
  • 3
  • 15
  • 17

1 Answers1

1

PHP will write the contents of the file exactly as they are in your source code. It takes bytes exactly as they are encoded in your .php file and puts them in a file. From then it depends on how the file is interpreted. Assuming your source code is actually UTF-8 encoded, so will the file be. Try opening it with a text editor that can understand UTF-8. Change the encoding the browser interprets it with to UTF-8 (View menu > Encoding). Check if the web server actually sets the correct charset header when you open it in the browser (Firebug Network tab, headers of the response).

It's correct that $output[0] shows a broken UTF-8 character, since PHP only gives you the first byte of the multi-byte character "–".

For more in-depth information, see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

deceze
  • 510,633
  • 85
  • 743
  • 889