32

I have a large string with brackets and commas and such. I want to strip all those characters but keep the spacing. How can I do this. As of now I am using

strippedList = re.sub(r'\W+', '', origList)
j00niner
  • 341
  • 1
  • 3
  • 3

4 Answers4

47
re.sub(r'([^\s\w]|_)+', '', origList)
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
10

A bit faster implementation:

import re

pattern = re.compile('([^\s\w]|_)+')
strippedList = pattern.sub('', value)
pata kusik
  • 872
  • 1
  • 7
  • 10
8

The regular-expression based versions might be faster (especially if you switch to using a compiled expression), but I like this for clarity:

"".join([c for c in origList if c in string.letters or c in string.whitespace])

It's a bit weird with the join() call, but I think that is pretty idiomatic Python for converting a list of characters into a string.

unwind
  • 391,730
  • 64
  • 469
  • 606
1

Demonstrating what characters you will get in the result:

>>> s = ''.join(chr(i) for i in range(256)) # all possible bytes
>>> re.sub(r'[^\s\w_]+','',s) # What will remain
'\t\n\x0b\x0c\r 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz'

Docs: re.sub, Regex HOWTO: Matching Characters, Regex HOWTO: Repeating Things

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251