1

I am looking for something like TRIM() in python, but .strip() doesn't accomplish this. Here's an example:

>>> s.strip()
'Elvis Presley made his film debut in this tale of three brothers who, 
 while serving in the Confederate Army, steal a Union Army payroll. \xc2\xa0'

>>> s2.strip()
'Elvis Presley made his film debut in this tale of three brothers who, 
 while serving in the Confederate Army, steal a Union Army payroll.'

>>> s.strip()==s2.strip()
False

How would I accomplish the above -- to trim all whitespace characters at the edges of text -- where I could get s.trim() == s2.trim() (other than just doing a hackish s.strip('\xc2\xa0').strip()?

David542
  • 104,438
  • 178
  • 489
  • 842
  • Related: https://stackoverflow.com/questions/10993612/python-removing-xa0-from-string#11566398 – dfundako Sep 20 '18 at 21:29
  • What version of Python are you using? (2 or 3) – payne Sep 20 '18 at 21:30
  • @payne 2.7 .is the version. – David542 Sep 20 '18 at 21:31
  • There is [`string.whitespace`](https://docs.python.org/3/library/string.html#string.whitespace) defining what Python considers a whitespace. `\xa0` doesn't belong in that list, tho. Even `\s` in its regex engine won't recognize it as whitespace so you'll have to do your own _hackish_ approach to remove the chars you want treated as whitespace. – zwer Sep 20 '18 at 21:34

2 Answers2

2

Since you are using Python 2.7, first convert your string to unicode and then strip:

s = unicode('test \xc2\xa0', "UTF-8")
s.strip()

yields:

u'test'

This will cause Python to recognize the \xc2\xa0 as a Unicode non-breaking space character, and properly trim it.

Without that, Python assumes it's an ASCII string and in that character set \xc2 and \xa0 aren't whitespace.

payne
  • 13,833
  • 5
  • 42
  • 49
0

I would suggest you use the replace function. You can do this:

s1 = s1.replace('\xc2', '').replace('\xa0', '')

You could encapsulate this logic if you have a large number of possible characters you want to trim off:

def replace_many(base_string, *to_remove):
    result = base_string
    for r in to_remove:
        result = result.replace(r, '')
    return result

replace_many(s, '\xc2', '\xa0') == s2.strip()
>>> True

You could also implement this using reduce:

# In Python 2
result = reduce(lambda a, r: a.replace(r, ''), ['\xc2', '\xa0'], 
    initializer = base_string.strip())

# In Python 3
import functools
result = functools.reduce(lambda a, r: a.replace(r, ''), ['\xc2', 'xa0'], 
    base_string.strip())
Woody1193
  • 7,252
  • 5
  • 40
  • 90