Python: Replace URLEncoded characters in String with what they represent

Question

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like:

I don&#39;t know what I am talking about

I can't seem to find any python library that will replace those characters with what they should be, so the resulting string looks like this:

I don't know what I am talking about

The closest I've gotten was

urllib.unquote(post_content).decode('utf-8')

But that still does not replace the url encoded character with a '. Does anyone know a good way to replace those urlencoded characters into the ascii characters they represent? There's also other errors that I get like ( and ) appearing as ( and )

This question is more suited to Stack Overflow. Programmers SE is about program design issues, not specific questions about source code. — logc, Mar 16 '15 at 14:01

score 0 · Answer 1 · edited May 23 '17 at 11:50

0

Those weird strings are called html entities. You can decode them as described by this URL: Decode HTML entities in Python string?. It says to use the function unescape from the module html.parse

edited May 23 '17 at 11:50

Community

1
1

answered Mar 16 '15 at 04:28

jkd

1,045
1
11
27

Python: Replace URLEncoded characters in String with what they represent

1 Answers1