31

I'm really bad with Regex but I want to remove all these .,;:'"$#@!?/\*&^-+ out of a string.

string x = "This is a test string, with lots of: punctuations; in it?!.";

How can I do that?

Mahozad
  • 18,032
  • 13
  • 118
  • 133
Sjemmie
  • 1,249
  • 8
  • 26
  • 31
  • 4
    Why not simply run a string.Replace? The performance will undoubtedly be better and the code will be much more readable to boot. – Tejs May 03 '11 at 15:24
  • 1
    possible duplicate of [Best way to strip punctuation from a string](http://stackoverflow.com/questions/421616/best-way-to-strip-punctuation-from-a-string) – Brian Rasmussen May 03 '11 at 15:26
  • @Tejs: The performance may or may not be better, depending on the length of the string and the number of characters that need to be replaced. Also, the code would not necessarily be less readable. A lot of people have an aversion to using regular expressions because they do look cryptic, but just like any other code - commenting them will help with that. – Josh M. May 03 '11 at 15:27
  • @Josh M. - All valid points. However, I subscribe the point that code should be self documenting; if you have to make a comment to explain some code, then that code itself is not clear enough for me =D – Tejs May 03 '11 at 15:30
  • related: https://stackoverflow.com/q/421616/3995261 – YakovL Feb 11 '18 at 19:08
  • This was answered here already: https://stackoverflow.com/questions/421616/best-way-to-strip-punctuation-from-a-string – IAmTimCorey May 03 '11 at 15:25

3 Answers3

93

First, please read here for information on regular expressions. It's worth learning.

You can use this:

Regex.Replace("This is a test string, with lots of: punctuations; in it?!.", @"[^\w\s]", "");

Which means:

[   #Character block start.
^   #Not these characters (letters, numbers).
\w  #Word characters.
\s  #Space characters.
]   #Character block end.

In the end it reads "replace any character that is not a word character or a space character with nothing."

Josh M.
  • 26,437
  • 24
  • 119
  • 200
  • I get Unrecognized escape sequence at \w\s – Sjemmie May 03 '11 at 15:29
  • Updating my answer...you just need to escape the slashes. – Josh M. May 03 '11 at 15:30
  • 1
    This is a beautiful answer. I was so set on finding a way to replace all punctuation that I never thought of just KEEPING all the non-punctuation (which is way easier to denote with \w and \s). – Matthew Goode Jul 27 '16 at 18:49
  • 5
    Be careful, I think the `\w` character group allows underscores, `_`. http://stackoverflow.com/a/2998550/1804678 – Jess Jan 05 '17 at 16:02
  • `But don't match this!` -- the regex in this answer will remove the apostrophe from don't – MikeNereson Oct 20 '20 at 14:03
  • 2
    @MikeNereson That's true, but is what the question asked for. – Josh M. Oct 21 '20 at 12:04
  • 5
    Be careful - things like ö, æ, ñ, ô etc aren't necessarily "word characters" included in `\w`. – grofte Mar 16 '21 at 15:26
  • 1
    @grofte you are right, this [^\w\s] is a popular answer to the question of matching punctuations with regex, however very few people know this is NOT an correct answer for the multi-lingual use case. So do we. It caused a lot of trouble and unexpected issues in our system and we found this bug today. – Tony Dec 28 '22 at 16:44
1

This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:

//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new 
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+", 
System.Text.RegularExpressions.RegexOptions.IgnoreCase);

string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);

return ParsedString;

And I've also stored the code here for future use: http://code.justingengo.com/post/Use%20a%20Regular%20Expression%20to%20Remove%20all%20Punctuation%20from%20a%20String

Sincerely,

S. Justin Gengo

http://www.justingengo.com

0

This will probably do what you want:

Regex.Replace("This is a string...", @"\p{P}", "");

See Regex: Match any punctuation character except . and _
and https://www.regular-expressions.info/posixbrackets.html

Mahozad
  • 18,032
  • 13
  • 118
  • 133