I have a database with SQL_Latin1_General_CP1_CI_AS collation that I need to convert to utf8_unicode_ci collation. Altering the table is not an option. MSSQL returns data like this:
[FirstName] => ??????????
but true data like this:
[FirstName] => مریم
I have tried these ways:
$utf8[] = utf8_encode($saleagent['FirstName']);
$utf8[] = iconv('ISO-8859-1', 'UTF-8',$saleagent['FirstName']);
$utf8[] = mb_convert_encoding($saleagent['FirstName'], 'UTF-8', 'ISO-8859-1');
the output is :
Array
(
[0] => ??????????
[1] => ??????????
[2] => ??????????
)
Is there any way to convert the existing data to utf8_unicode_ci?
Edit
According to this answer, I changed my code:
In SQL:
CAST(p.FirstName AS VARBINARY(MAX)) as FirstName,
In PHP I wrote this function to fix the encoding:
public function fixEncoding($string)
{
$original = $string;
$string = trim($string);
// Detect the input string encoding.
$encoding = mb_detect_encoding($string, mb_detect_order(), false);
if ($encoding == 'UTF-8' || $encoding == false) {
$hex = bin2hex($string);
$str = "";
for ($i = 0; $i < strlen($hex) - 1; $i += 2) {
$str .= chr(hexdec($hex[$i] . $hex[$i + 1]));
}
$string = iconv('UCS-2LE', 'UTF-8', $str);
return $string;
} elseif ($encoding == 'ASCII') {
$string = $original;
return $string;
}
return $string;
}
let me explain the situation.
I encounter with 3 types of strings in the mssql
database:
- Strings like English characters or numbers:
xyz 32 -
- Strings like Persian characters:
مریم دختر خوبی است
- Strings like both above:
مریم دختر خوبی است - Maryam is a good girl.
What happens is, the in 1 and 2 the code convert it successfully. However, number 3 returns error while it tries to convert it with iconv
and it throws the error below:
iconv(): Detected an incomplete multibyte character in input string
In this line:
$string = iconv('UCS-2LE', 'UTF-8', $str);
Do you have any solution to handle all 3 situations?