Unicode characters replace from string using C#

Question

string str = "our guests will experience \u001favor in an area";
 bool exists = str.IndexOf("\u001", StringComparison.CurrentCultureIgnoreCase) > -1;

I want to find and replace this characters \u001 in string.I tried hardly to resolve but still helpless.

Please Resolve this issue. Thanks in advance for your precious help.

Have a look at this post: https://stackoverflow.com/questions/28023682/how-do-i-remove-emoji-characters-from-a-string Not exactly the same problem but still it might be transferable. — Robin B, Aug 17 '18 at 08:08
Have you tried? https://learn.microsoft.com/en-us/dotnet/api/system.string.replace?redirectedfrom=MSDN&view=netframework-4.7.2#System_String_Replace_System_String_System_String_ — Federico Navarrete, Aug 17 '18 at 08:08
Your string does not contain any `\u001` (whatever that is) character. It contains one `\u001f` character. — , Aug 17 '18 at 08:09
I think you could try something like described here: https://stackoverflow.com/questions/1522884/remove-all-non-ascii-characters-from-string — Dimitri, Aug 17 '18 at 08:22

score 2 · Answer 1 · answered Aug 17 '18 at 08:21

Somewhere, deep inside C# specification, you can find following:

[Note: The use of the \x hexadecimal-escape-sequence production can be error-prone and hard to read due to the variable number of hexadecimal digits following the \x. For example, in the code:

string good = "\x9Good text";

string bad = "\x9Bad text";

it might appear at first that the leading character is the same (U+0009, a tab character) in both strings. In fact the second string starts with U+9BAD as all three letters in the word "Bad" are valid hexadecimal digits. As a matter of style, it is recommended that \x is avoided in favour of either specific escape sequences (\t in this example) or the fixed-length \u escape sequence. end note]

And also:

unicode-escape-sequence::

\u hex-digit hex-digit hex-digit hex-digit

\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit

To further simplify, \u is followed by 4 or 8 hex symbols - not 3. Your string is interpreted as "our guests will experience \u001favor in an area".

score 0 · Answer 2 · answered Aug 17 '18 at 08:23

If we look at the C# language specification, ECMA-334, in section 7.4.2 "Unicode character escape sequences", we find

A Unicode escape sequence represents a Unicode code point. Unicode escape sequences are processed in identifiers (§7.4.3), character literals (§7.4.5.5), and regular string literals (§7.4.5.6). A Unicode escape sequence is not processed in any other location (for example, to form an operator, punctuator, or keyword).

unicode-escape-sequence:: \u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit

So you have to use four hex digits with the \u.

In your example, it takes "001f" as those four hex digits.

The "\u001" in your example should have given an error in Visual Studio along the lines of "Unrecognized escape sequence."

Vladyslav Kurkotov · Accepted Answer · 2018-08-17T08:33:37.690

0

Use regexp:

var unicodeRegexp = new Regex(@"\x1f");
var testWord = "our guests will experience \u001favor in an area";
var newWord = unicodeRegexp.Replace(testWord, "text for replacement");

\x1f is the replacement for \uoo1f, leading zeros should be skipped https://www.regular-expressions.info/unicode.html#codepoint

edited Aug 17 '18 at 08:33

answered Aug 17 '18 at 08:28

Vladyslav Kurkotov

495
8
21

Unicode characters replace from string using C#

3 Answers3