Weird problem in python with removing \xa0 and other encoding when adding items to list

Question

I have the following code:

x = "$30.00\xa0USD\u202c"
e = x.strip()
print (e)
group = ["a", e.strip(), "b"]
print (group)

This is the result it gives me:

$30.00 USD‬
['a', '$30.00\xa0USD\u202c', 'b']

I want to remove the "\xa0" from the items that I add to the list but .strip() doesn't seem to be working, how do I solve this?

[`strip`](https://docs.python.org/3/library/stdtypes.html?#str.strip) without parameters returns a copy of the string with the leading and trailing whitespaces removed. `\xa0` isn't leading or trailing. Maybe you want to use `replace` instead? — Matthias, Jun 30 '21 at 09:28
[how-to-remove-xa0-from-string-in-python](https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python) `'\xa0'` is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space. `string = string.replace(u'\xa0', u' ')` — Patrick Artner, Jun 30 '21 at 09:32

score 1 · Answer 1 · answered Jun 30 '21 at 09:31

strip() strips whitespace from both sides and there is no whitespace to strip from the string you passed in. It's not clear what you hoped would happen or why; probably the proper solution is to fix whatever produced that string in the first place.

If you want to discard \xa0 then ... say so.

x = x.replace('\xa0', '')

If you want to extract only plain printable ASCII from the string, maybe try a regular expression.

import re
x = ' '.join(re.findall('[ -~]+', x))

If you want to strip \u202c, you can do that too, of course.

x = x.strip('\u202c\u202f')

(I threw in U+202F too just to show that it's easy to do.)

But again, the unholy mix of raw bytes and Unicode in your input string is likely a sign of corruption earlier in your processing. If you can fix that, maybe you will no longer need any of these.

Weird problem in python with removing \xa0 and other encoding when adding items to list

1 Answers1