85

Using JsonSerializer.Serialize(obj) will produce an escaped string, but I want the unescaped version. For example:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
        var a = new A{Name = "你好"};
        var s = JsonSerializer.Serialize(a);
        Console.WriteLine(s);
    }
}

class A {
    public string Name {get; set;}
}

will produce a string {"Name":"\u4F60\u597D"} but I want {"Name":"你好"}

I created a code snippet at https://dotnetfiddle.net/w73vnO
Please help me.

Pang
  • 9,564
  • 146
  • 81
  • 122
Joey
  • 1,233
  • 3
  • 11
  • 18
  • Aside from making the data less readable, the default escaping also bloats the size of the json by 40 percent. And that is a significant change when you are caching or sending large json payloads. – Yogi Dec 19 '21 at 07:46

4 Answers4

105

You need to set the JsonSerializer options not to encode those strings.

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

Then you pass this options when you call your Serialize method.

var s = JsonSerializer.Serialize(a, jso);        

Full code:

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, jso);        
Console.WriteLine(s);

Result:

enter image description here

If you need to print the result in the console, you may need to install additional language. Please refer here.

rcs
  • 6,713
  • 12
  • 53
  • 75
  • 6
    I could not believe my eyes when I found this: https://learn.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0 This is extremely surprising behavior by the default encoder. – arkod Nov 07 '19 at 20:33
  • 3
    It's important to understand the potential concerns with using this in your scenario and I would recommend safer alternatives if feasible. See https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#serialize-all-characters – ahsonkhan Jan 16 '20 at 02:24
  • 4
    Those docs never mention _why_ they avoid serializing those. Why was the decision made to encode everything when characters like the double-quote `"` and control chars have specific escape sequences for them?! – gregsdennis Jul 12 '20 at 22:46
  • 4
    Using an "unsafe" encoding is not the answer, the answer from ahsonkhan is correct – Karrde Jun 09 '21 at 04:24
  • I had to sign up to stackoverflow to upvote this! I agree that it is not safe but it answers the question. – Tono Nam Jul 05 '21 at 02:37
  • @gregsdennis the docs do mention why such a decision was made (choosing safer defaults for security): https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-character-encoding?view=netcore-3.0 – ahsonkhan Aug 18 '21 at 21:38
  • Can't we override the global settings? – CodingNinja Oct 07 '21 at 09:24
45

To change the escaping behavior of the JsonSerializer you can pass in a custom JavascriptEncoder to the JsonSerializer by setting the Encoder property on the JsonSerializerOptions.

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder

The default behavior is designed with security in mind and the JsonSerializer over-escapes for defense-in-depth.

If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder using the Create factory method rather than using the UnsafeRelaxedJsonEscaping encoder.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)
};

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, options);
Console.WriteLine(s);

Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.

I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.

See the remarks section within the API docs: https://learn.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks

You could also consider specifying UnicodeRanges.All if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};

For more information and code samples, see: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding

See the Caution Note

ahsonkhan
  • 2,285
  • 2
  • 11
  • 15
  • 6
    @joey I know this came later but it should become the accepted answer – Ruben Bartelink Apr 08 '20 at 14:46
  • 1
    Here's the updated doc page link: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-character-encoding?view=netcore-3.0 – ahsonkhan Aug 18 '21 at 21:39
  • "If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this. [UnsafeRelaxedJsonEscaping]" to highlight this part – imsan Nov 25 '22 at 13:58
12

Use:

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};
Cyrus
  • 2,261
  • 2
  • 22
  • 37
10

You can use: System.Text.RegularExpressions.Regex.Unescape(string) to unescape the unicode characters. https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

Updating example from original question:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
            var a = new A{Name = "你好"};
            var s = JsonSerializer.Serialize(a);
        
            var unescaped = System.Text.RegularExpressions.Regex.Unescape(s);

            Console.WriteLine(s);
            Console.WriteLine(unescaped);
        }
}

class A {
    public string Name {get; set;}
}

Output:

{"Name":"\u4F60\u597D"}
{"Name":"你好"}
Steven Peirce
  • 498
  • 7
  • 10