How to do HTML escaping in Python?

Question

I am trying to implement a function which replaces the following values:

# > with &gt;
# < with &lt;
# " with &quot;
# & with &amp

I keep getting an error with my function. What exactly is wrong?

def escape_html(s):
    data = list(s)
    if ">" in data:
        data.replace(">","&gt;") 
    if "<" in data:
        data.replace("<","&lt;") 
    if '"' in data:
        data.replace('"',"&quot;") 
    if "&" in data:
        data.replace("&","&amp;") 
    word = data.join()
    return word

print escape_html("<>")

Note: This is more of a fundamental programming question. My focus is the reason why my function isn't working. I cannot use outside libraries for this project.

See http://stackoverflow.com/q/1061697/2870069, http://stackoverflow.com/q/3096948/2870069, http://stackoverflow.com/q/11336384/2870069 and others... — Jakob, Jan 04 '14 at 07:19
This is more of a fundamentals in programming questions rather than what is the most efficient way of solving the problem. I'm more worried about why my code doesn't work. — KishB87, Jan 04 '14 at 07:21
If this is all that you want to replace, use 'html escape' as others have suggested. However, if you want to stick with this method, you should replace & first, because new &'s are created when you replace in the first three cases. — Frank Cangialosi, Jan 04 '14 at 07:22

falsetru · Answer 1 · 2014-01-04T07:18:36.183

8

Use cgi.escape:

>>> import cgi
>>> cgi.escape('<this & that>')
'&lt;this &amp; that&gt;'

If you use Python 3.2+, use html.escape as documentation suggest:

cgi.escape

Deprecated since version 3.2: This function is unsafe because quote is false by default, and therefore deprecated. Use html.escape() instead.

edited Jan 04 '14 at 07:18

answered Jan 04 '14 at 07:13

falsetru

357,413
63
732
636

Could you please show an example where `quote` makes a difference? – Martin Thoma Sep 26 '17 at 09:51
@MartinThoma, `cgi.escape('"')` returns `'"'` while `html.escape('"')` returns `'"'`. – falsetru Sep 26 '17 at 13:44

thefourtheye · Answer 2 · 2014-01-04T07:20:15.487

4

There are builtin functions to do that. You can use cgi.escape if you are using Python 2.x. It is deprecated in Python 3.2. So, if you are using Python >= 3.2, you can use html.escape

edited Jan 04 '14 at 07:20

answered Jan 04 '14 at 07:15

thefourtheye

233,700
52
457
497

score 1 · Answer 3 · answered Jan 04 '14 at 07:21

1

You could also use replace, which is a bit more universal.

For example,

string = ">>>"
new_string = string.replace(">", "&gt;")
print new_string # '&gt;&gt;&gt;'

However, keep in mind that if you're trying to replace double quotes, you'd need to enclose them in single quotes, and vice-versa

answered Jan 04 '14 at 07:21

Frank Cangialosi

433
1
4
12

1

There really is no need to be universal between Python 2x and 3x because the two aren't compatible in the first place – samrap Jan 04 '14 at 07:25
I just meant in the event that he wanted to replace something other than those 4 characters. I realize that the context is for dealing with HTML content, but just thought it might be worth nothing that HTML escape only works for those chars, and won't be helpful for any others – Frank Cangialosi Jan 04 '14 at 07:26
That's exactly why I voted you up :) – samrap Jan 04 '14 at 07:29

Omid Raha · Answer 4 · 2014-01-04T07:30:13.203

0

def escape_html(data):
    return data.replace("&","&amp;").replace('"',"&quot;").replace(">","&gt;").replace("<","&lt;")

edited Jan 04 '14 at 07:30

answered Jan 04 '14 at 07:22

Omid Raha

9,862
1
60
64

Order of replace is important. – Omid Raha Jan 04 '14 at 07:32

score 0 · Answer 5 · answered Jan 04 '14 at 07:25

0

You could use xml.sax.saxutils which ships an escape method. See escaping HTML.

answered Jan 04 '14 at 07:25

Jakob

19,815
6
75
94

How to do HTML escaping in Python?

5 Answers5