On my website (hobby project, not commercial) I have to manage around 8 different translations, that are provided by volunteers. I manage these translations in Google Sheets. These sheets are saved as CSV and read out by PHP whenever it encounters a string in need of translation. So far, it works flawless! :D [ me happy ]
To update these sheets, I have written a PHP routine that collects all the strings-to-be-translated and puts them in a CSV. I then import this CSV into Google Sheets again.
This works beautifully, except for a couple of characters per language. For example, in portuguese, the 'à' displays as the notorious lozenge-with-a-question-mark symbol,
Also chinese goes fine(!), except for one character:
I know there are many questions and answers on this subject here at Stackoverflow. After reading these questions, I discovered that my files were intitially written out as "Western MAC OS encoding". Now I added some BOM characters, and indeed, TextWrangler recognizes it as 'UTF 8', but warns me that the file is corrupt. Indeed, the suspicious characters also don't display well in TextWrangler either.
I also see references to a function called 'iconv', but that doesn't seem to have no influence.
I have the feeling I mis a crucial step. Would you people mind having a look at a piece of code and help me futher?
// Write the translations to a CSV file
$fp = fopen("languages/gsheet_$language.csv", 'w');
fwrite($f, pack("CCC",0xef,0xbb,0xbf));
write_csv($fp, array('key','translation','notes'));
foreach($rows as $cols){
array_walk($cols, "convert");
write_csv($fp,$cols);
}
fclose($fp);
}
function write_csv($fp,$row){
foreach($row as $key => $value){
$row[$key] = "\"$value\"";
}
fwrite($fp,implode(",",$row));
fwrite($fp,"\n");
}
The post UTF-8 all the way through is very informative, but has a strong focus on the database. However, I found the origin of the problem to be in the fgetcsv function: http://php.net/manual/en/function.fgetcsv.php#96049