1

I am using regular Stream Reader to get response from Facebook graph API response https://graph.facebook.com/XXXX?access_token=&fields=id,name,about,address,last_name

I am reading the response stream yet it returns me {"id":"XXXXX","name":"K\u0131r\u0131nt\u0131 Reklam"...}

My code is below - I unsuccessfully tried using explicitly UTF-8 and "iso-8859-9" (Turkish) encodings and setting accept-charset headers. I read Joel's famous article about encodings. It looks like each of the chars '\' 'u' '1' '3' '1' are coming as characters from facebook - I thought this would have been 2 bytes for value 131 in UTF-8. I am confused. I expect this string to be "Kırıntı Reklam".

I could simply find/replace those strings - yet it would be far from elegant and maintainable. How should I properly process or convert the facebook graph api response for strings with accents?

using (WebResponse response = request.GetResponse())
{
using (Stream dataStream = response.GetResponseStream())
{
    if (dataStream != null)
    {
        using (StreamReader reader = new StreamReader(dataStream))
        {
            responseFromServer = reader.ReadToEnd();
        }

    }
}
}

Thank you in advance

user3141326
  • 1,423
  • 2
  • 21
  • 31
  • [Working solution to decode the text before parsing it.](https://stackoverflow.com/a/50803989/396337) – Zyo Feb 23 '21 at 13:26

1 Answers1

0

tldr; use a JSON library - I like Json.NET - and don't worry about it.

The JSON shown is valid JSON where \uABCD in a JSON string represents a UTF-16 encoded character1. The internal JSON character escaping format is useful to avoid having to deal with Unicode stream encoding issues - it allows JSON to be represented entirely in ASCII/7-bit-clean characters (which is a subset of UTF-8).

Using a conforming JSON library to parse the JSON with such escapes would restore the JSON into an appropriate object-graph, of which some values will be properly-decoded String values. The library is responsible for understanding JSON and converting/reading it as appropriate - this includes correctly handling any such \u escape sequences.

The stream itself (that of the JSON text) should use the encoding that the server says, is indicated by a BOM, or has been pre-negotiated: but really, just UTF-8 here. This is how the JSON text is encoded, but has no bearing on the escape sequences found in JSON strings.


1 Per RFC 4627, The application/json Media Type for JavaScript Object Notation (JSON):

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A though F can be upper or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

Alternatively, there are two-character sequence escape representations of some popular characters. So, for example, a string containing only a single reverse solidus character may be represented more compactly as "\\".

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E"


For the doubters, here is a LINQPad example. This uses JSON.Net and imports the Newtonsoft.Json.Linq namespace.

var json = @"{""name"":""K\u0131r\u0131nt\u0131 Reklam""}";
json.Dump();                        // -> {"name":"K\u0131r\u0131nt\u0131 Reklam"}
var name = JObject.Parse(json)["name"].ToString();
(name == "Kırıntı Reklam").Dump();  // -> true
user2864740
  • 60,010
  • 15
  • 145
  • 220
  • The faceboook encoding is dead wrong, everyone complain about it, Json.Net library doesn't fix anything. – Zyo Feb 23 '21 at 13:20
  • @Zyo People complaining about valid things, even if such are not expected, does not make the valid things "dead wrong". I've updated the answer the a demonstration snippet that can be run in LINQPad. JSON.Net understands/accepts such *valid and well-defined JSON*, and thus correctly deserializes the value. Ensure to use JSON-aware libraries/tools with JSON data. – user2864740 Feb 23 '21 at 18:29