Remove the single backslash in variant of C#

Question

I was trying to remove the single backslash from my variant. But unfortunately, I don't know why it doesn't work. Please help to review it .

string s = "此检查项己被你忽略，请联系医生。\u2028内科";
string us= s.Replace(@"\","dddd");
Console.Write(us);

Did I missed something? Thanks.

The back slash you are seeing in this particular string is of a Unicode character in the string. That's not recognized as back slash by the runtime. — Chetan, Mar 03 '18 at 09:09
This backslash is part of symbol, this is not a backslash actually — Alexey Klipilin, Mar 03 '18 at 09:09
If I just want to remove these Unicode character from the string. How can I do? Thanks.@ChetanRanpariya — Joe.wang, Mar 03 '18 at 09:58
I had tried `string us= s.Replace(@"\u2028",string.Empty);`. But it doesn't work. — Joe.wang, Mar 03 '18 at 10:03
Could you please show me the code. Thank in advance !@InBetween — Joe.wang, Mar 03 '18 at 10:03
HTML has special characters. See : https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references. To encode and decode use : System.Net.WebUtility.HtmlEncode(string) and System.Net.WebUtility.HtmlDecode(string) — jdweng, Mar 03 '18 at 10:33
@jdweng I don't want to decode these characters .I just want to remove all these unicode character. Thanks. — Joe.wang, Mar 03 '18 at 10:38
Uh, they're _all_ unicode characters. If you remove "all these unicode characters" you're just removing everything in the string. — Nyerguds, Mar 05 '18 at 12:59
If you want to know what \u2028 means, it's [not hard to find if you just search for it.](https://stackoverflow.com/questions/3072152/what-is-unicode-character-2028-ls-line-separator-used-for) — Nyerguds, Mar 05 '18 at 13:01

EricChen1248 · Accepted Answer · 2018-03-03T10:42:21.827

2

That slash is an escape character and not the literal char "\"

If you want to remove the part before 內科, you can do

string us = s.Replace("\u2028", string.Empty);

Note that compared to your version in your comments, there is no @. @ in front of a string in C# means that it is a verbatim string, meaning that it will ignore all escape sequence in the string.

Take a look at these links for more info: Verbatim String, Escape Sequence

Edit: If you want to remove all Unicode characters (which is what \uXXXX is), it's a bit more complicated, you'll need something like Regex. Add the using

using System.Text.RegularExpressions;

and change the replace from above to

string us = Regex.Replace(s, @"[^\u0000-\u007F]", String.Empty);

Basically it uses pattern matching to search for unicode characters and replaces it.

Link for Regex

edited Mar 03 '18 at 10:42

answered Mar 03 '18 at 10:13

EricChen1248

466
6
19

Thanks your answer. What if I want to remove all the `\uxxxx` character? – Joe.wang Mar 03 '18 at 10:16
Thanks your answer. the code `Regex.Replace(s, @"[^\u0000-\u007F]", String.Empty);` will remove all the characters. Why ? – Joe.wang Mar 03 '18 at 11:39
Oh, I didn't realise ahead of time. I'm silly. All chinese words are unicode encoded. In my own test I tested it with English, with a single \uXXXX. Huh, well I'm at a loss, except for handcoding in all the ones you want to remove then. The /u2028 is one character that the computer doesn't know how to render, which is why it shows up like that instead of chinese chars. – EricChen1248 Mar 03 '18 at 12:19
@Joe.wang I told you in another post (and jdweng here), that you can use the `WebUtility.HtmlDecode()`. With its overload that uses a Textwriter, the final text will not have any Unicode symbols in it. You can do it this way: `TextWriter _writer = new StreamWriter([Output], false, Encoding.Unicode); WebUtility.HtmlDecode([Input], _writer); _writer.Flush(); _writer.Dispose();` – Jimi Mar 03 '18 at 16:44

Remove the single backslash in variant of C#

1 Answers1