How to detect if a string needs to be converted to UTF-8

Question

I'm writing a script to import datas from a csv file into a database. We are not sure if the file will always be in UTF-8. They will be made by "average" people on windows.

Here are the functions I end up with

function isUTF8($filename)
{
    $info = finfo_open(FILEINFO_MIME_ENCODING);
    $type = finfo_buffer($info, file_get_contents($filename));
    finfo_close($info);

    return $type == 'utf-8' || $type == 'us-ascii';
}

function returnStringUTF8($string,$isUTF8){
    if(!$isUTF8 || mb_detect_encoding($string, 'UTF-8', true)){
        $string=utf8_encode($string);
    }
    return $string;
}

Here is how I will use them

$isUTF8 = isUTF8($filename);

.... Parsing the file

$myUTF8EncodedString = returnStringUTF8($stringFromTheFile,$isUTF8)

....

The function isUTF8 seems to work fine according to my tests, but I've read somewhere that sometimes it can be wrong. That why I decided to "double check" by adding the function returnStringUTF8. But I'm not quiet sure if this function will always return the right thing, a string encoded in UTF-8.

Possible duplicate of [Detect encoding and make everything UTF-8](https://stackoverflow.com/questions/910793/detect-encoding-and-make-everything-utf-8) — Oliver Nybo, May 09 '19 at 08:43

How to detect if a string needs to be converted to UTF-8

0 Answers0