I'm writing a script to import datas from a csv file into a database. We are not sure if the file will always be in UTF-8. They will be made by "average" people on windows.
Here are the functions I end up with
function isUTF8($filename)
{
$info = finfo_open(FILEINFO_MIME_ENCODING);
$type = finfo_buffer($info, file_get_contents($filename));
finfo_close($info);
return $type == 'utf-8' || $type == 'us-ascii';
}
function returnStringUTF8($string,$isUTF8){
if(!$isUTF8 || mb_detect_encoding($string, 'UTF-8', true)){
$string=utf8_encode($string);
}
return $string;
}
Here is how I will use them
$isUTF8 = isUTF8($filename);
.... Parsing the file
$myUTF8EncodedString = returnStringUTF8($stringFromTheFile,$isUTF8)
....
The function isUTF8 seems to work fine according to my tests, but I've read somewhere that sometimes it can be wrong. That why I decided to "double check" by adding the function returnStringUTF8. But I'm not quiet sure if this function will always return the right thing, a string encoded in UTF-8.