I have created a function to convert the following text to UTF-8, as it appeared to be in Windows-1252 format, due to being copied to a database table from a Word Document.
Testing weird character’s correction
This seems to fix the dodgy ’
character. However i'm not getting � in the following:
Devon�s most prominent dealerships
When passing the following through the same function:
Devon's most prominent dealerships
Below is the code which does the converting:
function Windows1252ToUTF8($text) {
return mb_convert_encoding($text, "Windows-1252", "UTF-8");
}
Edit:
The database can't be changed due to holding thousands of custom records. I tried the below but the mb_detect_encoding
thinks character’s correction
is UTF-8.
function Windows1252ToUTF8($text) {
if (mb_detect_encoding($text) == "UTF-8") {
return $text;
}
return mb_convert_encoding($text, "Windows-1252", "UTF-8");
}
Edit 2: Just tried the example from the PHP Documentation:
$str = 'áéóú'; // ISO-8859-1
echo "<pre>";
var_dump(mb_detect_encoding($str, 'UTF-8')); // 'UTF-8'
var_dump(mb_detect_encoding($str, 'UTF-8', true)); // false
echo "</pre>";
die();
but this simply outputs:
string(5) "UTF-8" string(5) "UTF-8"
So I can't even detect the encoding of the string :S
Edit 3: This seems to do the trick:
function Windows1252ToUTF8($text) {
$badChars = [ "â", "á", "ú", "é", "ó" ];
$match = preg_match("/[".join("",$badChars)."]/", $text);
if ($match) {
return mb_convert_encoding($text, "Windows-1252", "UTF-8");
}
return $text;
}
Edit 4: I have matched the hex values to their corresponding values. However when I get to the weird characters they don't appear to match.