14

In perl s/[^\w:]//g would replace all non alphanumeric characters EXCEPT :

In python I'm using re.sub(r'\W+', '',mystring) which does remove all non alphanumeric except _ underscore. Is there any way to put exceptions, I wish not to replace signs like = and .

Previously I was applying the other approach i.e. to replace all unwanted characters usingre.sub('[!@#\'\"$()]', '',mystring`) However, it is not possible for me to predict what all characters may come in mystring hence I wish to remove all non alphanumeric characters except a few.

Google didnt provide an appropriate answer. The closest search being python regex split any \W+ with some exceptions but this didnt help me either.

Community
  • 1
  • 1
pythonRcpp
  • 2,042
  • 6
  • 26
  • 48

3 Answers3

16

You can specify everything that you need not remove in the negated character clas.

re.sub(r'[^\w'+removelist+']', '',mystring)

Test

>>> import re
>>> removelist = "=."
>>> mystring = "asdf1234=.!@#$"
>>> re.sub(r'[^\w'+removelist+']', '',mystring)
'asdf1234=.'

Here the removelist variable is a string which contains the list of all characters you need to exclude from the removal.

What does negated character class means

When the ^ is moved into the character class it does not acts as an anchor where as it negates the character class.

That is ^ in inside a character class say like [^abc] it negates the meaning of the character class.

For example [abc] will match a b or c where as [^abc] will not match a b or c. Which can also be phrased as anything other than a b or c

nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52
  • thanks @nu11p01n73R . I was not adding remove list inside, [] . I gave something like '^w.=' which off course was not working. Could you please tell the meaning of r and ^, ^ is usually used as "start with" but here it seems to be having a different meaning. – pythonRcpp Jan 14 '15 at 09:06
  • @user1977867 Yep when `^` in inside a character class say like `[^abc]` it negates the meaning of the character class. That is `[abc]` will match `a` `b` or `c` where as `[^abc]` will not match `a` `b` or `c`. That is anything other than `a` `b` or `c` – nu11p01n73R Jan 14 '15 at 09:08
  • 1
    Can I ask why you're calling removelist 'remove'list? It seems to me that it's a list of chars that you'd like to keep. I'm only mentioning it because it had me confused. – ikku100 Mar 31 '17 at 09:37
  • @ikku100 Ohh, I have mentioned it in the answer *Here the removelist variable is a string which contains the list of all characters you need to exclude from the removal.* – nu11p01n73R Apr 01 '17 at 13:51
9
re.sub(r'[^a-zA-Z0-9=]', '',mystring)

You can add whatever you want like _ whichever you want to save.

vks
  • 67,027
  • 10
  • 91
  • 124
7

I believe the approach you describe in perl could also be used in python, eg:

re.sub(r'[^\w=]', '',mystring)

would remove everything except word-characters and =

maillard
  • 680
  • 1
  • 14
  • 27