0

I was trying to remove the single backslash from my variant. But unfortunately, I don't know why it doesn't work. Please help to review it .

string s = "此检查项己被你忽略,请联系医生。\u2028内科";
string us= s.Replace(@"\","dddd");
Console.Write(us);

Did I missed something? Thanks.

enter image description here

enter image description here

Joe.wang
  • 11,537
  • 25
  • 103
  • 180
  • 2
    The back slash you are seeing in this particular string is of a Unicode character in the string. That's not recognized as back slash by the runtime. – Chetan Mar 03 '18 at 09:09
  • 2
    This backslash is part of symbol, this is not a backslash actually – Alexey Klipilin Mar 03 '18 at 09:09
  • If I just want to remove these Unicode character from the string. How can I do? Thanks.@ChetanRanpariya – Joe.wang Mar 03 '18 at 09:58
  • simply remove the whole escape sequence: "\u2028" – InBetween Mar 03 '18 at 09:58
  • I had tried `string us= s.Replace(@"\u2028",string.Empty);`. But it doesn't work. – Joe.wang Mar 03 '18 at 10:03
  • Could you please show me the code. Thank in advance !@InBetween – Joe.wang Mar 03 '18 at 10:03
  • HTML has special characters. See : https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references. To encode and decode use : System.Net.WebUtility.HtmlEncode(string) and System.Net.WebUtility.HtmlDecode(string) – jdweng Mar 03 '18 at 10:33
  • @jdweng I don't want to decode these characters .I just want to remove all these unicode character. Thanks. – Joe.wang Mar 03 '18 at 10:38
  • Uh, they're _all_ unicode characters. If you remove "all these unicode characters" you're just removing everything in the string. – Nyerguds Mar 05 '18 at 12:59
  • If you want to know what \u2028 means, it's [not hard to find if you just search for it.](https://stackoverflow.com/questions/3072152/what-is-unicode-character-2028-ls-line-separator-used-for) – Nyerguds Mar 05 '18 at 13:01

1 Answers1

2

That slash is an escape character and not the literal char "\"

If you want to remove the part before 內科, you can do

string us = s.Replace("\u2028", string.Empty);

Note that compared to your version in your comments, there is no @. @ in front of a string in C# means that it is a verbatim string, meaning that it will ignore all escape sequence in the string.

Take a look at these links for more info: Verbatim String, Escape Sequence

Edit: If you want to remove all Unicode characters (which is what \uXXXX is), it's a bit more complicated, you'll need something like Regex. Add the using

using System.Text.RegularExpressions; 

and change the replace from above to

string us = Regex.Replace(s, @"[^\u0000-\u007F]", String.Empty);

Basically it uses pattern matching to search for unicode characters and replaces it.

Link for Regex

EricChen1248
  • 466
  • 6
  • 19
  • Thanks your answer. What if I want to remove all the `\uxxxx` character? – Joe.wang Mar 03 '18 at 10:16
  • Thanks your answer. the code `Regex.Replace(s, @"[^\u0000-\u007F]", String.Empty);` will remove all the characters. Why ? – Joe.wang Mar 03 '18 at 11:39
  • Oh, I didn't realise ahead of time. I'm silly. All chinese words are unicode encoded. In my own test I tested it with English, with a single \uXXXX. Huh, well I'm at a loss, except for handcoding in all the ones you want to remove then. The /u2028 is one character that the computer doesn't know how to render, which is why it shows up like that instead of chinese chars. – EricChen1248 Mar 03 '18 at 12:19
  • @Joe.wang I told you in another post (and jdweng here), that you can use the `WebUtility.HtmlDecode()`. With its overload that uses a Textwriter, the final text will not have any Unicode symbols in it. You can do it this way: `TextWriter _writer = new StreamWriter([Output], false, Encoding.Unicode); WebUtility.HtmlDecode([Input], _writer); _writer.Flush(); _writer.Dispose();` – Jimi Mar 03 '18 at 16:44