5

I want to filter out the special characters from a string, but I also want to preserve some of them (for example, I want to preserve apostrophe, &, -, %)

This code will keep the apostrophe, but how can I add the other symbols I want to keep?

msg = 'Hi! I'm Mike, I like M&M. How are you?' 

pattern = re.compile("[^\w']")
pattern.sub ('', msg)

>>Output Desired: "Hi I'm Mike I like M&M How are you"
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
Alex
  • 1,447
  • 7
  • 23
  • 48

1 Answers1

3

Add them like this:

import re
msg = "Hi! I'm Mike, I like M&M. How are you?"
pattern = re.compile(r"[^\s\w'&%-]")
print(pattern.sub ('', msg)) # => Hi I'm Mike I like M&M How are you

See the Python demo

Details

  • [^ - start of a negated character class matching any char but...
    • \s - whitespace
    • \w - word char (letter, digit or/and _)
    • ' - a single quote
    • & - a & symbol -% - a % symbol
    • - - a hyphen (since it is at the end, it will be parsed as a literal -)
  • ] - end of the character class.

Note you may escape the hyphen to make sure it will be parsed as a literal - even if you add more chars after it and forget to escape it: r"[^\s\w'&%\-]".

See this answer to learn more how to use a literal hyphen inside a Python regex character class.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563