I, like many other PHP developers have had issues with character encoding, the question will outline the steps I go through to ensure that my data is saved and outputted as UTF8. I would like any advice on what else I should consider and or change with my current thinking.
I have a mysql database DEFAULT CHARACTER UTF-8
my tables have collation of utf8_general_ci
I am using a php script to read data from an RSS feed then saving that data to by database. Before I save that data I check to see whether that data is UTF-8 or not by doing the following:
protected function _convertToUTF8($content) {
$enc = mb_detect_encoding($content);
return mb_convert_encoding($content, "UTF-8", $enc);
}
When outputting this data to a webpage I set the headers in php
header("Content-type: text/html; charset=utf-8");
and I also set the Content-Type meta tag to be utf-8
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
So far everything works as expected I get no funny characters outputting and all is going smoothly, but should I be changing/considering anything else when dealing with this data?
The problem I am now having is outputting this data to a txt file (csv) I am using fwrite() which has successfully created the file but the 3rd party I am passing this file to says that the file is not UTF-8. I am not sure the data is being outputted as UTF-8, how can I check this? When logged into the remote server over SSH and i cat the file i get Itâs a
when I vim the file I get Itâ~@~Ys
when i less the file I get It<E2><80><99>s
. What am I missing here?
Thanks in advance!