1

I'm reading a json file where some fields have string like the following: "Eduardo Fonseca Bola\u00c3\u00b1os comparti\u00c3\u00b3 una publicaci\u00c3\u00b3n."

The final end reslt should look like this "Eduardo Fonseca Bolaños compartió una publicación."

  • Is there any out of the box converted to do this using C#?
  • Which is the correct way to convert these kinds of json data?
anthony sottile
  • 61,815
  • 15
  • 148
  • 207
Eduardo Fonseca
  • 435
  • 1
  • 6
  • 19

2 Answers2

2

You can use Json.NET library to decode the string. The deserializer decodes the string automatically.

public class Example
{
    public String Name { get; set; }
}
// 
var i = @"{ ""Name"" : ""Eduardo Fonseca Bola\u00c3\u00b1os comparti\u00c3\u00b3 una publicaci\u00c3\u00b3n."" }";
var jsonConverter = Newtonsoft.Json.JsonConvert.DeserializeObject(i);

// Encode the string to UTF8
byte[] bytes = Encoding.Default.GetBytes(jsonConverter.ToString());
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

// Deserialize using class
var sample = Newtonsoft.Json.JsonConvert.DeserializeObject<Example>(i);
byte[] bytes = Encoding.Default.GetBytes(sample.Name);
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

The output is:

{
  "Name": "Eduardo Fonseca Bolaños compartió una publicación."
}

Option 2

You can use System.Web.Helpers.Json.Decode method. You won't need to use any external libraries.

Rushi Vyas
  • 66
  • 4
  • Thanks @Rushi, that's what I tried first, but when using the JSON .NET, the string I get ends up like this { "Name": "Eduardo Fonseca Bolaños compartió una publicación." } – Eduardo Fonseca Mar 18 '19 at 17:27
  • You will have to convert the string to UTF8. I have edited my original post to add the piece of code. – Rushi Vyas Mar 18 '19 at 19:30
  • This doesn't seem scalable. You are writing custom deserialising code for every property? I found this because I'm trying to adapt an existing open source project to allow unicode strings, and can't see how this would work in non-trivial cases. I have thousands of string properties in my JSON model classes. – AnnanFay Nov 19 '22 at 20:08
0

Here is the fix for this specific situation

        private static Regex _regex = 
        new Regex(@"(\\u(?<Value>[a-zA-Z0-9]{4}))+", RegexOptions.Compiled);
    private static string ConvertUnicodeEscapeSequencetoUTF8Characters(string sourceContent)
    {
        //Check https://stackoverflow.com/questions/9738282/replace-unicode-escape-sequences-in-a-string
        return _regex.Replace(
            sourceContent, m =>
            {
                var urlEncoded = m.Groups[0].Value.Replace(@"\u00", "%");
                var urlDecoded = System.Web.HttpUtility.UrlDecode(urlEncoded);
                return urlDecoded;
            }
        );
    }

Based on Replace unicode escape sequences in a string

Eduardo Fonseca
  • 435
  • 1
  • 6
  • 19