0
using (WebClient client = new WebClient())
{
        client.Encoding = Encoding.UTF8;
        string v=client.DownloadString("https://feed.mix.sina.com.cn/api/roll/get?pageid=155&lid=1686&num=1&page=1&callback=feedCardJsonpCallback&_=" + DateTime.Now.ToString("HHmmss"));
        string v1=Regex.Split(v, ",\"title\":\"")[1].Split('\"')[0];
        Encoding utf8 = Encoding.UTF8;
        Encoding window1252 = Encoding.Default;
        byte[] postBytes = window1252.GetBytes(v1);
        string decodedText = utf8.GetString(postBytes);
        MessageBox.Show(decodedText);
}

v1 is supposed to be like this: "\u8428\u5c14\u74e6\u591a\u7387\u5148\u201c\u5403\u8783\u87f9\u201d \u6bd4\u7279\u5e01\u5c06\u5f00\u542f\u62c9\u7f8e\u201c\u540e\u7f8e\u5143\u65f6\u4ee3\u201d\uff1f"

I wanted to convert it to readable chinese character but I keep failing to do so

  • 1
    I would *strongly* suggest that you download the raw bytes to start with, rather than downloading it as a string in the "wrong" encoding and then trying to correct it. That's almost *always* a road to missing/broken data. – Jon Skeet Jul 07 '21 at 11:20
  • *Don't* convert anything. .NET strings are Unicode, they don't need any conversion. Windows is a Unicode OS, which means Windows strings are also Unicode unless you use ASCII Win32 APIs. What you do can only mangle the text – Panagiotis Kanavos Jul 07 '21 at 11:20
  • 1
    (I'd also suggest that you migrate to HttpClient if you possibly can...) – Jon Skeet Jul 07 '21 at 11:20
  • Have you tried simply displaying the already decoded Unicode string, `v`? What happens if you use `MessageBox.Show(v)`? – Panagiotis Kanavos Jul 07 '21 at 11:21
  • BTW you'll find a lot of SO questions with Chinese,Arabic, Cyrillic or Greek text, which proves that no conversion is needed. SO is an ASP.NET site storing text in Unicode database fields and the pages are UTF8, like almost every other web site. That's why I can write `Αυτό Εδώ` without escape sequences or encoding – Panagiotis Kanavos Jul 07 '21 at 11:23
  • 1
    BTW what's the actual content? That Regex is suspicious. Using a regular expression to parse a JSON string will only end up mangling the text, and *definitely* mangle any JSON text that uses escape sequences. A JSON deserializer like JSON.NET or System.Text.Json would return a proper Unicode string – Panagiotis Kanavos Jul 07 '21 at 11:27
  • when I messagebox.show(v), it show ""\u8428\u5c14\u74e6". still not the real character. But if i go to a website which allows me to convert unicode to chinese, i can get the correct character. \u8428 give me 萨 which is the character i wanted – user2349425 Jul 07 '21 at 11:36
  • 1
    You are downloading a string containing escaped Unicode, you need to un-escape it rather than doing anything related to text encoding, see this method: https://stackoverflow.com/a/9738396/246342 for example. As commented JSON parser would likely do this for you automatically. – Alex K. Jul 07 '21 at 12:02

1 Answers1

0

string r1=Regex.Unescape(v1);

Solved it! Thanks to everyone especially Alex K.