13

I want to serialize a .NET object to JSON which contains foreign language strings such as Chinese or Russian. When i do that (using the code below) in the resulting JSON it encodes those characters which are stored as strings as "?" instead of the requisite unicode char.

using Newtonsoft.Json;

var serialized = JsonConvert.SerializeObject(myObj, new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All, Formatting = Newtonsoft.Json.Formatting.Indented });

Is there a way to use the JSON.Net serializer with foreign languages?

E.g

אספירין (hebrew)

एस्पिरि (hindi)

阿司匹林 (chinese)

アセチルサリチル酸 (japanese)

Many Thanks!

Ian Kemp
  • 28,293
  • 19
  • 112
  • 138
Jon S
  • 141
  • 1
  • 1
  • 5

2 Answers2

20

It is not the serializer that is causing this issue; Json.Net handles foreign characters just fine. More likely you are doing one of the following:

  1. Using an inappropriate encoding (or not setting the encoding) when writing the JSON to a file or stream. You should probably be using Encoding.UTF8.
  2. Storing the JSON into a varchar column in your database rather than nvarchar. varchar does not support unicode characters.
  3. Viewing the JSON with a viewer that does not support unicode, uses the wrong encoding and/or uses a font that does not have the full set of unicode character glyphs. The Windows command prompt window seems to have this issue, for example.

To prove that the serializer is not the problem, try compiling and running the following example program. It will create two different output files from the same JSON, one using UTF-8 encoding and the other using the default encoding. Open each file using Notepad. The "default" file will have the foreign characters as ? characters. In the UTF-8 encoded file, you should see all the characters are intact. (If you still don't see them, try changing the Notepad font to "Arial Unicode MS".)

You can also see the foreign characters are correct in the JSON using the Visual Studio debugger; just put a breakpoint after the line where it serializes the JSON and examine the json variable.

using System;
using System.Collections.Generic;
using System.IO;
using Newtonsoft.Json;

class Program
{
    static void Main(string[] args)
    {
        List<Foo> foos = new List<Foo>
        {
            new Foo { Language = "Hebrew", Sample = "אספירין" },
            new Foo { Language = "Hindi", Sample = "एस्पिरि" },
            new Foo { Language = "Chinese", Sample = "阿司匹林" },
            new Foo { Language = "Japanese", Sample = "アセチルサリチル酸" },
        };

        var json = JsonConvert.SerializeObject(foos, Formatting.Indented);

        File.WriteAllText("utf8.json", json, Encoding.UTF8);
        File.WriteAllText("default.json", json, Encoding.Default);
    }
}

class Foo
{
    public string Language { get; set; }
    public string Sample { get; set; }
}
Brian Rogers
  • 125,747
  • 31
  • 299
  • 300
  • 1
    Thanks Brian - helpful comments, i tracked the problem down, it was upstream in the processing, the data was zipped and stored in a binary database column, when it was retrieved from the database it was returned using Default Encoding, not UTF8. Now fixed and working. `Newtonsoft.Json.JsonConvert.DeserializeObject>(Encoding.Default.GetString(decompressed), new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All });` – Jon S Sep 07 '15 at 10:16
  • Great! I'm glad you were able to fix the problem. – Brian Rogers Sep 07 '15 at 15:54
  • 1
    Thank you @BrianRogers, You helped me as well answering this article. – Claudio Corchez Sep 26 '19 at 16:26
1

I have been using an Arabic text and I find the solution here

In this section Serialize all characters

options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options);