Python URL Characters

Question

I really new to Python and coding in general, but I have been making some good strides.

I am able to pull some data off of the web through an API, and the result should be a string. What I am seeing though, are some instances such as "& amp;"" and " &quot". (I modified the character sets so it would print properly to the screen)

I figure there is a way to clean this string and remove the characters such that it looks like it does on a computer screen. I tried searching for urldecoding, but admittedly I dont even know if that is the solution.

Any help on how to remove these "extra" characters and produce a readable string will be greatly appreciated!

Many thanks in advance,

Brock

See http://stackoverflow.com/questions/1208916/decoding-html-entities-with-python The keyword is `HTML entity/ies`. Many python libraries help you convert or deal with these in various ways. — mjv, Feb 18 '10 at 04:01
Where are you getting these data? Presumably these are part of an HTML or XML file, and in parsing it your parser should automatically unescape it for you. — Mike Graham, Feb 18 '10 at 04:52

score 2 · Accepted Answer · answered Feb 18 '10 at 04:02

xml.sax.saxutils.unescape(data[, entities]): Unescape '&amp', '&lt', and '&gt' in a string of data.

You can unescape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. '&amp', '&lt', and '&gt' are always unescaped, even if entities is provided.

Python URL Characters

1 Answers1