I have this code fragment (Python 2.7):
from bs4 import BeautifulSoup
content = ' foo bar';
soup = BeautifulSoup(content, 'html.parser')
w = soup.get_text()
At this point w
has a byte with value 160 in it, but it's encoding is ASCII
.
How do replace all of the \xa0 bytes by another character?
I've tried:
w = w.replace(chr(160), ' ')
w = w.replace('\xa0', ' ')
but I am getting the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
And why does BS return an ASCII
encoded string with an invalid character in it?
Is there a way to convert w
to a 'latin1` encoded string?