0

I'm getting JSON from a webservice with encoded characters: \u201c, etc. As I'm parsing it works perfectly: double quotes inside texts have the encoded character value, while control double quotes are not encoded, so the parser see the right JSON structure. The problem is after I write it to a file and read it, it spoils the JSON. I no longer have \u201c, but " characters inside content texts.

  • If I encode it with utf-8, " are changed to the File Separator (28) character and - is changed to Control Device 3 (0x13) and results in a parsing exception.
  • If I encode it with ascii, " are changed to ? character.
  • If I encode it with iso-8859-1, " stays decoded ".

Is there any way to preserve the unencoded characters after writing and reading?

SAMPLE:

I'm using Newtonsoft.Json.Linq

Encoding encoding = Encoding.GetEncoding("ISO-8859-1");
webResponse = (HttpWebResponse)webRequest.GetResponse();
using (StreamReader streamReader = new StreamReader(webResponse.GetResponseStream(), encoding))
{
    responseString = streamReader.ReadToEnd();
}
JToken json = JObject.Parse(responseString);
using (StreamWriter stream = new StreamWriter(path, true, encoding))
{
    stream.Write(json.ToString());
}
string spoiledJsonString = File.ReadAllText(path, encoding);
JToken sureNotToBeCreated = JObject.Parse(spoiledJsonString); // EXCEPTION
Matheus Simon
  • 668
  • 11
  • 34
  • 3
    It would be really helpful if you'd show a short but complete program demonstrating the problem. It's unclear how you're diagnosing this... you should use UTF-8. – Jon Skeet Nov 24 '14 at 17:00
  • Writing is one thing, the reading and display of the file is important as well, and we don't have any information about that. – Maarten Bodewes Nov 24 '14 at 17:04
  • any other info you want let me know. – Matheus Simon Nov 24 '14 at 17:14
  • You can't use `json.ToString()` if you don't treat the output as unicode afterwards. Either use proper unicode encoding like UTF-8 (the json standard defines json as a sequence of unicode code-points), or tell your json serializer to escape any non ASCII character. – CodesInChaos Nov 24 '14 at 17:43
  • See [Using StringEscapeHandling.EscapeNonAscii with Json.NET](http://stackoverflow.com/questions/14095247/using-stringescapehandling-escapenonascii-with-json-net) and [How to use Json.NET StringEscapeHandling.EscapeNonAscii](http://stackoverflow.com/questions/14106894/how-to-use-json-net-stringescapehandling-escapenonascii) – CodesInChaos Nov 24 '14 at 17:49
  • Please can you provide a working sample, like the one in my answer, that fails with UTF-8 (or any other Unicode encoding,) and JSON parsing? – Jodrell Nov 24 '14 at 17:57

1 Answers1

1

If I write the test program,

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

class Program
{
    private static void Main()
    {
        var encoding = Encoding.GetEncoding("ISO-8859-1");
        var testString = new string(new[] { (char)0x201c });
        string roundTripped;

        using (var m = new MemoryStream())
        {
            using(var writer = new StreamWriter(m, encoding))
            {
                var reader = new StreamReader(m, encoding);
                writer.Write(testString);
                writer.Flush();
                m.Seek(0, SeekOrigin.Begin);
                roundTripped = reader.ReadToEnd();
            }
        }
    }

    Debug.Assert(
        string.Equals(testString, roundTripped),
        "These strings should be equal.");
}

I recreate your problem, the quote has been escaped.

If I change the encoding to Encoding.UTF8, it works successfully.


As supported here, ISO-8859-1 is not a Unicode charset so is a bad choice for encoding Unicode.

As supported here, JSON text is Unicode.

So we can deduce, ISO-8859-1 is a bad choice for encoding JSON strings.


The program,

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

using Newtonsoft.Json.Linq;

class Program
{
    private static void Main()
    {
        var encoding = Encoding.UTF8;
        var testJson = new JObject
            {
                new JProperty(
                    "AQuote",
                    string(new[] { (char)0x201c }))
            };

        JObject roundTripped;

        using (var m = new MemoryStream())
        {
            using(var writer = new StreamWriter(m, encoding))
            {
                var reader = new StreamReader(m, encoding);
                writer.Write(testJson.ToString());
                writer.Flush();
                m.Seek(0, SeekOrigin.Begin);
                roundTripped = JObject.Parse(reader.ReadToEnd());
            }
        }
    }

    Debug.Assert(
        string.Equals(
            testJson["AQuote"].Value<string>(),
            roundTripped["AQuote"].Value<string>()),
        "These strings should be equal.");
}

runs without warning, so I suspect you have some other issue than UTF-8.

Community
  • 1
  • 1
Jodrell
  • 34,946
  • 5
  • 87
  • 124