I'm a little confused on how to unescape characters in python. I am parsing some HTML using BeautifulSoup, and when I retrieve the text content it looks like this:
\u00a0\n\n\n\r\nState-of-the-art security and 100% uptime SLA.\u00a0\r\n\n\n\r\nOutstanding support
I'd like for it to look like this:
State-of-the-art security and 100% uptime SLA. Outstanding support
Here is my code below:
self.__page = requests.get(url)
self.__soup = BeautifulSoup(self.__page.content, "lxml")
self.__page_cleaned = self.__removeTags(self.__page.content) #remove script and style tags
self.__tree = html.fromstring(self.__page_cleaned) #contains the page html in a tree structure
page_data = {}
page_data["content"] = self.__tree.text_content()
How do I remove those encoded backslashed characters? I've looked everywhere and nothing has worked for me.