3

I am trying to implement a function which replaces the following values:

# > with >
# < with &lt;
# " with &quot;
# & with &amp

I keep getting an error with my function. What exactly is wrong?

def escape_html(s):
    data = list(s)
    if ">" in data:
        data.replace(">","&gt;") 
    if "<" in data:
        data.replace("<","&lt;") 
    if '"' in data:
        data.replace('"',"&quot;") 
    if "&" in data:
        data.replace("&","&amp;") 
    word = data.join()
    return word

print escape_html("<>")

Note: This is more of a fundamental programming question. My focus is the reason why my function isn't working. I cannot use outside libraries for this project.

KishB87
  • 195
  • 1
  • 2
  • 12
  • See http://stackoverflow.com/q/1061697/2870069, http://stackoverflow.com/q/3096948/2870069, http://stackoverflow.com/q/11336384/2870069 and others... – Jakob Jan 04 '14 at 07:19
  • This is more of a fundamentals in programming questions rather than what is the most efficient way of solving the problem. I'm more worried about why my code doesn't work. – KishB87 Jan 04 '14 at 07:21
  • 1
    If this is all that you want to replace, use 'html escape' as others have suggested. However, if you want to stick with this method, you should replace & first, because new &'s are created when you replace in the first three cases. – Frank Cangialosi Jan 04 '14 at 07:22

5 Answers5

8

Use cgi.escape:

>>> import cgi
>>> cgi.escape('<this & that>')
'&lt;this &amp; that&gt;'

If you use Python 3.2+, use html.escape as documentation suggest:

cgi.escape

Deprecated since version 3.2: This function is unsafe because quote is false by default, and therefore deprecated. Use html.escape() instead.

falsetru
  • 357,413
  • 63
  • 732
  • 636
4

There are builtin functions to do that. You can use cgi.escape if you are using Python 2.x. It is deprecated in Python 3.2. So, if you are using Python >= 3.2, you can use html.escape

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1

You could also use replace, which is a bit more universal.

For example,

string = ">>>"
new_string = string.replace(">", "&gt;")
print new_string # '&gt;&gt;&gt;'

However, keep in mind that if you're trying to replace double quotes, you'd need to enclose them in single quotes, and vice-versa

Frank Cangialosi
  • 433
  • 1
  • 4
  • 12
  • 1
    There really is no need to be universal between Python 2x and 3x because the two aren't compatible in the first place – samrap Jan 04 '14 at 07:25
  • I just meant in the event that he wanted to replace something other than those 4 characters. I realize that the context is for dealing with HTML content, but just thought it might be worth nothing that HTML escape only works for those chars, and won't be helpful for any others – Frank Cangialosi Jan 04 '14 at 07:26
  • That's exactly why I voted you up :) – samrap Jan 04 '14 at 07:29
0
def escape_html(data):
    return data.replace("&","&amp;").replace('"',"&quot;").replace(">","&gt;").replace("<","&lt;")
Omid Raha
  • 9,862
  • 1
  • 60
  • 64
0

You could use xml.sax.saxutils which ships an escape method. See escaping HTML.

Jakob
  • 19,815
  • 6
  • 75
  • 94