TRIM in python for all whitespace characters

Question

I am looking for something like TRIM() in python, but .strip() doesn't accomplish this. Here's an example:

>>> s.strip()
'Elvis Presley made his film debut in this tale of three brothers who, 
 while serving in the Confederate Army, steal a Union Army payroll. \xc2\xa0'

>>> s2.strip()
'Elvis Presley made his film debut in this tale of three brothers who, 
 while serving in the Confederate Army, steal a Union Army payroll.'

>>> s.strip()==s2.strip()
False

How would I accomplish the above -- to trim all whitespace characters at the edges of text -- where I could get s.trim() == s2.trim() (other than just doing a hackish s.strip('\xc2\xa0').strip()?

Related: https://stackoverflow.com/questions/10993612/python-removing-xa0-from-string#11566398 — dfundako, Sep 20 '18 at 21:29
There is [`string.whitespace`](https://docs.python.org/3/library/string.html#string.whitespace) defining what Python considers a whitespace. `\xa0` doesn't belong in that list, tho. Even `\s` in its regex engine won't recognize it as whitespace so you'll have to do your own _hackish_ approach to remove the chars you want treated as whitespace. — zwer, Sep 20 '18 at 21:34

score 2 · Accepted Answer · answered Sep 20 '18 at 21:37

Since you are using Python 2.7, first convert your string to unicode and then strip:

s = unicode('test \xc2\xa0', "UTF-8")
s.strip()

yields:

u'test'

This will cause Python to recognize the \xc2\xa0 as a Unicode non-breaking space character, and properly trim it.

Without that, Python assumes it's an ASCII string and in that character set \xc2 and \xa0 aren't whitespace.

Woody1193 · Answer 2 · 2018-09-20T22:03:30.257

I would suggest you use the replace function. You can do this:

s1 = s1.replace('\xc2', '').replace('\xa0', '')

You could encapsulate this logic if you have a large number of possible characters you want to trim off:

def replace_many(base_string, *to_remove):
    result = base_string
    for r in to_remove:
        result = result.replace(r, '')
    return result

replace_many(s, '\xc2', '\xa0') == s2.strip()
>>> True

You could also implement this using reduce:

# In Python 2
result = reduce(lambda a, r: a.replace(r, ''), ['\xc2', '\xa0'], 
    initializer = base_string.strip())

# In Python 3
import functools
result = functools.reduce(lambda a, r: a.replace(r, ''), ['\xc2', 'xa0'], 
    base_string.strip())

TRIM in python for all whitespace characters

2 Answers2