0

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:

I don't know what I am talking about

I can't seem to find any python library that will replace those characters with what they should be, so the resulting string looks like this:

I don't know what I am talking about

The closest I've gotten was

urllib.unquote(post_content).decode('utf-8')

But that still does not replace the url encoded character with a '. Does anyone know a good way to replace those urlencoded characters into the ascii characters they represent? There's also other errors that I get like ( and ) appearing as ( and )

  • This question is more suited to Stack Overflow. Programmers SE is about program design issues, not specific questions about source code. – logc Mar 16 '15 at 14:01

1 Answers1

0

Those weird strings are called html entities. You can decode them as described by this URL: Decode HTML entities in Python string?. It says to use the function unescape from the module html.parse

Community
  • 1
  • 1
jkd
  • 1,045
  • 1
  • 11
  • 27