2

I really new to Python and coding in general, but I have been making some good strides.

I am able to pull some data off of the web through an API, and the result should be a string. What I am seeing though, are some instances such as "& amp;"" and " &quot". (I modified the character sets so it would print properly to the screen)

I figure there is a way to clean this string and remove the characters such that it looks like it does on a computer screen. I tried searching for urldecoding, but admittedly I dont even know if that is the solution.

Any help on how to remove these "extra" characters and produce a readable string will be greatly appreciated!

Many thanks in advance,

Brock

mjv
  • 73,152
  • 14
  • 113
  • 156
Btibert3
  • 38,798
  • 44
  • 129
  • 168
  • 3
    See http://stackoverflow.com/questions/1208916/decoding-html-entities-with-python The keyword is `HTML entity/ies`. Many python libraries help you convert or deal with these in various ways. – mjv Feb 18 '10 at 04:01
  • Where are you getting these data? Presumably these are part of an HTML or XML file, and in parsing it your parser should automatically unescape it for you. – Mike Graham Feb 18 '10 at 04:52

1 Answers1

2

xml.sax.saxutils.unescape(data[, entities]): Unescape '&amp', '&lt', and '&gt' in a string of data.

You can unescape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. '&amp', '&lt', and '&gt' are always unescaped, even if entities is provided.

pwdyson
  • 1,177
  • 7
  • 14