0

I'm creating a logic to replace the unprintable characters from a string with a space, just that I'm confused if it is the same ASCII characters and Unicode characters, I have reviewed about how to do using regex.replace function but I don't understand how to validate if the character from the string is between the below conditions.

This is the requirement I got, replace with a space:

  • All ASCII characters with values of 0 through 31.
  • Unicode characters with values 127, 129, 141, 143, 144 and 157

I have tried this (I believe this works for ASCII characters) but do I handle unicode characters?

newPartNum = Regex.Replace(PartNum, @"[^\u0020-\u007E]", " ");

Any help would be appreciate it.

maccettura
  • 10,514
  • 3
  • 28
  • 35
  • 1
    What have you tried already? – maccettura May 10 '18 at 17:24
  • Hi @maccettura I have this newPartNum = Regex.Replace(ttPartR.PartNum, @"[^\u0020-\u007E]", " "); I believe this works for ASCII characterS but how do I handle unicode characters? – user3038537 May 10 '18 at 17:28
  • 1
    Include that in your question, _not_ in the comments – maccettura May 10 '18 at 17:29
  • See also [Char](https://learn.microsoft.com/dotnet/csharp/language-reference/keywords/char) and [Strings](https://learn.microsoft.com/dotnet/csharp/programming-guide/strings/index). – Corak May 10 '18 at 17:30
  • You should also include a sample string so we have a [MCVE] – maccettura May 10 '18 at 17:39
  • 1
    What do you mean by "unprintable"? Would you consider a zero-width space to be "unprintable"? What about a right-to-left mark? Combining diacritics? Carriage returns and line feeds? You need to start by defining the problem. – Joe White May 10 '18 at 17:41
  • Possible duplicate of [Regex for all PRINTABLE characters](https://stackoverflow.com/questions/1247762/regex-for-all-printable-characters) – revo May 10 '18 at 18:08
  • Use `Cc` category: `\p{Cc}` this matches all non-printable characters. – revo May 10 '18 at 18:10
  • .NET doesn't have any ASCII character datatypes; only Unicode. How about replacing control characters with [control picture](https://www.unicode.org/charts/nameslist/c_2400.html) characters? – Tom Blodget May 10 '18 at 22:25

2 Answers2

2

With the help of Linq, You can check if the character is control character. What I am targeting below is to remove the control characters from string -

string str = ""; // Whatever your string is. Comes here.
string res = new string(str.Where(c => !char.IsControl(c)).ToArray());
Arpit Gupta
  • 1,209
  • 1
  • 22
  • 39
0

Take a look at the IsControl method of the Char struct type. If for no other reason it talks about the range of control characters.

Also, using a range of valid chars in your regex is certainly doable but may get messy when dealing with Unicode chars since the range is large. Might be better to just look for the characters you need to replace. Again, look at the Char.IsControl method for details.

Jeff R.
  • 1,493
  • 1
  • 10
  • 14