1

I have a database with SQL_Latin1_General_CP1_CI_AS collation that I need to convert to utf8_unicode_ci collation. Altering the table is not an option. MSSQL returns data like this:

[FirstName] => ??????????

but true data like this:

[FirstName] => مریم

I have tried these ways:

$utf8[] = utf8_encode($saleagent['FirstName']);
$utf8[] = iconv('ISO-8859-1', 'UTF-8',$saleagent['FirstName']);
$utf8[] = mb_convert_encoding($saleagent['FirstName'], 'UTF-8', 'ISO-8859-1');

the output is :

Array
(
    [0] => ??????????
    [1] => ??????????
    [2] => ??????????
)

Is there any way to convert the existing data to utf8_unicode_ci?

Edit


According to this answer, I changed my code:

In SQL:

CAST(p.FirstName AS VARBINARY(MAX)) as FirstName,

In PHP I wrote this function to fix the encoding:

public function fixEncoding($string)
{
    $original = $string;
    $string = trim($string);
    // Detect the input string encoding.
    $encoding = mb_detect_encoding($string, mb_detect_order(), false);

    if ($encoding == 'UTF-8' || $encoding == false) {
        $hex = bin2hex($string);

        $str = "";
        for ($i = 0; $i < strlen($hex) - 1; $i += 2) {
            $str .= chr(hexdec($hex[$i] . $hex[$i + 1]));
        }
        $string = iconv('UCS-2LE', 'UTF-8', $str);
        return $string;

    } elseif ($encoding == 'ASCII') {
        $string = $original;
        return $string;
    }

    return $string;
}

let me explain the situation. I encounter with 3 types of strings in the mssql database:

  1. Strings like English characters or numbers: xyz 32 -
  2. Strings like Persian characters: مریم دختر خوبی است
  3. Strings like both above: مریم دختر خوبی است - Maryam is a good girl.

What happens is, the in 1 and 2 the code convert it successfully. However, number 3 returns error while it tries to convert it with iconv and it throws the error below:

iconv(): Detected an incomplete multibyte character in input string

In this line:

$string = iconv('UCS-2LE', 'UTF-8', $str);

Do you have any solution to handle all 3 situations?

Maryam
  • 365
  • 2
  • 15

2 Answers2

1

Solution:

You may try with this:

  • encode your html and php files in UTF-8 (I usually use Notepad++ for this step).
  • select values from MS SQL Server with CONVERT(varbinary(max), UNICODECOLUMN);
  • convert values in PHP part with $value = iconv('UTF-16LE', 'UTF-8', $unicodetext);

You may change your function to:

<?php
...
public function fixEncoding($string) {
    return iconv('UTF-16LE', 'UTF-8', $string);
}
...
?>

Example:

<html>
    <head>
    <meta http-equiv="X-UA-Compatible" content="IE=edge"/>
    <meta charset="utf-8">
    </head>
    <body>
    <?php
    # Connection
    $server   = "server\instance,port";
    $user     = "user";
    $password = "password";
    $database = "database";
    $cinfo = array(
        "Database" => $database,
        "UID" => $user,
        "PWD" => $password
    );
    $conn = sqlsrv_connect($server, $cinfo);
    if ($conn === false)
    {
        echo "Error (sqlsrv_connect): ".print_r(sqlsrv_errors(), true);
        exit;
    }

    # Statement
    $sql = "
        SELECT CONVERT(varbinary(max), [NVARCHARTEXT]) AS [NVARCHARTEXT] 
        FROM [dbo].UnicodeText
    ";
    $stmt = sqlsrv_query($conn, $sql);
    if ($stmt === false) {
        echo "Error (sqlsrv_prepare): ".print_r(sqlsrv_errors(), true);
        exit;
    }
    while ($row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC)) {
        $value = $row["NVARCHARTEXT"];
        $value = iconv('UTF-16LE', 'UTF-8', $value);
        echo $value.'</br>';
    }

    # End
    sqlsrv_free_stmt($stmt);
    sqlsrv_close($conn);
    ?>
    </body>
</html>

Notes:

Examples uses PHP Driver for SQL Server, SQL Server 2012, table with nvarchar column.

Zhorov
  • 28,486
  • 6
  • 27
  • 52
0

The likely causes:

  • The table/column was not declared to be UTF-8
  • The bytes in the client were not encoded UTF-8
Rick James
  • 135,179
  • 13
  • 127
  • 222