In my case I want to remove specifically the „
and the ”
characters from a string. I use BeautifulSoup to parse certain html paragraphs, and get a substring from them. So far my code looks like this:
# -*- coding: cp1252 -*-
from bs4 import BeautifulSoup as bs
import re
soup = bs(open("file.xhtml"), 'html.parser')
for tag in soup.find_all('p', {"class": "fnp2"}) :
line = unicode(str(tag).split(':')[0], "utf-8")
line = re.sub('(<p class="fnp2">)(\d+) ', '', line)
line = line.replace('„', '')
print line
But for that, I always receive a UnicodeDecodeError
:
line = line.replace('„', '')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position
0: ordinal not in range(128)
What would be a solution for this?