Unicode symbols become "??????????" at PHP side but data is NVARCHAR and SELECT shows correct data

Question

I have a database with SQL_Latin1_General_CP1_CI_AS collation that I need to convert to utf8_unicode_ci collation. Altering the table is not an option. MSSQL returns data like this:

[FirstName] => ??????????

but true data like this:

[FirstName] => مریم

I have tried these ways:

$utf8[] = utf8_encode($saleagent['FirstName']);
$utf8[] = iconv('ISO-8859-1', 'UTF-8',$saleagent['FirstName']);
$utf8[] = mb_convert_encoding($saleagent['FirstName'], 'UTF-8', 'ISO-8859-1');

the output is :

Array
(
    [0] => ??????????
    [1] => ??????????
    [2] => ??????????
)

Is there any way to convert the existing data to utf8_unicode_ci?

Edit

According to this answer, I changed my code:

In SQL:

CAST(p.FirstName AS VARBINARY(MAX)) as FirstName,

In PHP I wrote this function to fix the encoding:

public function fixEncoding($string)
{
    $original = $string;
    $string = trim($string);
    // Detect the input string encoding.
    $encoding = mb_detect_encoding($string, mb_detect_order(), false);

    if ($encoding == 'UTF-8' || $encoding == false) {
        $hex = bin2hex($string);

        $str = "";
        for ($i = 0; $i < strlen($hex) - 1; $i += 2) {
            $str .= chr(hexdec($hex[$i] . $hex[$i + 1]));
        }
        $string = iconv('UCS-2LE', 'UTF-8', $str);
        return $string;

    } elseif ($encoding == 'ASCII') {
        $string = $original;
        return $string;
    }

    return $string;
}

let me explain the situation. I encounter with 3 types of strings in the mssql database:

Strings like English characters or numbers: xyz 32 -
Strings like Persian characters: مریم دختر خوبی است
Strings like both above: مریم دختر خوبی است - Maryam is a good girl.

What happens is, the in 1 and 2 the code convert it successfully. However, number 3 returns error while it tries to convert it with iconv and it throws the error below:

iconv(): Detected an incomplete multibyte character in input string

In this line:

$string = iconv('UCS-2LE', 'UTF-8', $str);

Do you have any solution to handle all 3 situations?

Is it `varchar` or `nvarchar` column? What does SSMS show when running same select? — Ivan Starostin, Sep 15 '18 at 09:36
it is nvarchar .I running my project on ubunto; so I don't have SSMS, but I have dataGripe and it shows data just fine. @IvanStarostin — Maryam, Sep 15 '18 at 09:46
Then data is fine. `Collation` affects comparison of values and their sorting. So the issue is on PHP side only. — Ivan Starostin, Sep 15 '18 at 09:52
Try this: https://stackoverflow.com/questions/20042778/cakephp-sqlserver-encoding — Ivan Starostin, Sep 15 '18 at 09:53
thanks. but my project is running by yii2. Do you have any solution for yii2 framework? @IvanStarostin — Maryam, Sep 15 '18 at 09:59
see this thread, https://stackoverflow.com/questions/10124930/encoding-sql-latin1-general-cp1-ci-as-into-utf-8 — Muhammad Omer Aslam, Sep 15 '18 at 11:30
have you looked into this thread https://stackoverflow.com/questions/26092388/iconv-detected-an-incomplete-multibyte-character-in-input-string — Muhammad Omer Aslam, Sep 17 '18 at 13:09

Zhorov · Answer 1 · 2018-09-17T11:27:30.197

Solution:

You may try with this:

encode your html and php files in UTF-8 (I usually use Notepad++ for this step).
select values from MS SQL Server with CONVERT(varbinary(max), UNICODECOLUMN);
convert values in PHP part with $value = iconv('UTF-16LE', 'UTF-8', $unicodetext);

You may change your function to:

<?php
...
public function fixEncoding($string) {
    return iconv('UTF-16LE', 'UTF-8', $string);
}
...
?>

Example:

<html>
    <head>
    <meta http-equiv="X-UA-Compatible" content="IE=edge"/>
    <meta charset="utf-8">
    </head>
    <body>
    <?php
    # Connection
    $server   = "server\instance,port";
    $user     = "user";
    $password = "password";
    $database = "database";
    $cinfo = array(
        "Database" => $database,
        "UID" => $user,
        "PWD" => $password
    );
    $conn = sqlsrv_connect($server, $cinfo);
    if ($conn === false)
    {
        echo "Error (sqlsrv_connect): ".print_r(sqlsrv_errors(), true);
        exit;
    }

    # Statement
    $sql = "
        SELECT CONVERT(varbinary(max), [NVARCHARTEXT]) AS [NVARCHARTEXT] 
        FROM [dbo].UnicodeText
    ";
    $stmt = sqlsrv_query($conn, $sql);
    if ($stmt === false) {
        echo "Error (sqlsrv_prepare): ".print_r(sqlsrv_errors(), true);
        exit;
    }
    while ($row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC)) {
        $value = $row["NVARCHARTEXT"];
        $value = iconv('UTF-16LE', 'UTF-8', $value);
        echo $value.'</br>';
    }

    # End
    sqlsrv_free_stmt($stmt);
    sqlsrv_close($conn);
    ?>
    </body>
</html>

Notes:

Examples uses PHP Driver for SQL Server, SQL Server 2012, table with nvarchar column.

It's not working in some cases which I have a string like this: `????? ???????? ????? ?????? ???? 5?2` — Siamak Motlagh, Sep 17 '18 at 10:28
@SiamakA.Motlagh Sorry for that delay. Can you describe your situation? Thanks! — Zhorov, Sep 18 '18 at 10:35

score 0 · Answer 2 · answered Oct 09 '18 at 02:07

0

The likely causes:

The table/column was not declared to be UTF-8
The bytes in the client were not encoded UTF-8

answered Oct 09 '18 at 02:07

Rick James

135,179
13
127
222

Unicode symbols become "??????????" at PHP side but data is NVARCHAR and SELECT shows correct data

2 Answers2