3

I'm trying to pass data from my server and display it in a UWP Windows app. The data is stored in a mySQL database

enter image description here enter image description here

This is being output via PHP to a web page here http://www.rwscripts.com/scorealerts/v3/request.php?action=getTeams using this code

// Serialize the data structure   
    $result = json_encode($data,JSON_PRETTY_PRINT);
    // Display the XML document   
    header('Content-type: application/json; charset=utf-8'); 

    print $result;

I'm then reading this in my app with HttpWebRequest and then deserializing the JSON with JSON.net

            JArray obj = JsonConvert.DeserializeObject(str.Trim()) as JArray;
            if (obj == null || obj.Count == 0) return;

            foreach (NotificationTeam nt in from JObject team in obj
                select
                    new NotificationTeam
                    {
                        Title = team.Value<string>("teamName"),
                        TeamID = team.Value<int>("tid"),
                        Followers = team.Value<int>("followers")
                    })
            {
                nt.Notifications = ScoreManager.GetMgr().GetTeamNotification(nt.TeamID);

                notificationTeams.Add(nt);
            }

the output when displayed in my app is like this

enter image description here

which part of the flow needs to be changed to display the unicode characters correctly?

Real World
  • 1,593
  • 1
  • 21
  • 46

4 Answers4

5

There is no way you can fix this beyond the generated json - because it's completely wrong and here is why:

  • Special characters in team names (Köln) get stored as UTF-8 in your database. ö in UTF-8 is 0xc3 0xb6.
  • The output data however is then encoded (or just formatted) again as UTF-16 (aka Encoding.Unicode in C#) Here is where the trouble starts. ö in UTF-16 (and UTF-32) is 0x00 0xf6.
  • The UTF-8 character bytes get encoded as two separate UTF-16 characters \u00c3 and \u00b6 instead of just \u00f6. So instead of one utf-8 character, you end up with two utf-16 characters (that represent two bytes of the same utf-8 character).
  • Your app recognises the \u escape sequences and turns them (completly right from its point of view) into two separate UTF-16 characters (ö).

Long story short, this is what happens to your strings:

ö in UTF-32 is f6000000
ö in UTF-16 is f600
ö in UTF-8 is 3c b6

  1. Köln (Input)
  2. K[0xc3][0xb6]ln (Sql UTF-8)
  3. K\u00c3\u00b6ln (Json UTF-8 encoded as UTF-16)
  4. Köln (C# UTF-16 decoded)

Since json_encode expects a UTF-8 string, I suspect the problem occurs somewhere between the database and the encoding (php).

This post might give you a hint to where the encoding settings might be inconsistent:

UTF-8-all-the-way-through

In case you need to tinker with your settings, the output you will want is:

"teamName": "1. FC K\u00f6ln" or "teamName": "1. FC Köln" (should be fine too)

Community
  • 1
  • 1
Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
0

Instead of this ..

$result = json_encode($data,JSON_PRETTY_PRINT);

.. maybe this .. ?

$result = json_encode($data,JSON_UNESCAPED_UNICODE);

.. or both.. ?

$result = json_encode($data, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE);
Spooky
  • 1,235
  • 15
  • 17
0

I believe you need to get bytes of your Unicode characters and covert it to String.

var bytes = Encoding.Unicode.GetBytes(NotificationTeam.Title);
NotificationTeam.Title = Encoding.ASCII.GetString(bytes);

OR

new NotificationTeam
                    {
                        Title = Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(team.Value<string>("teamName"))),
                        TeamID = team.Value<int>("tid"),
                        Followers = team.Value<int>("followers")
                    })
techspider
  • 3,370
  • 13
  • 37
  • 61
0

This might be due to the bug in .NET Connector. In this case, you should specify

character_set_server=utf8mb4

in the config or [--character-set-server=utf8mb4][1] in the mysqld arguments

druss
  • 1,820
  • 19
  • 18