0

I wanto to replase ">" with ">". And the same for other symbols.

def escape_html(s):
    s = s.replace(">", r">")
    s = s.replace("<", "&lt;")
    s = s.replace('"', "&quot;")
    s = s.replace('&', "&amp;")
    return s

print escape_html(">")

The result is &amp;gt;

But I need &gt;

Could you help me understand why raw string doesn't help me. And how should I write the code?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Michael
  • 4,273
  • 3
  • 40
  • 69

1 Answers1

2

You need to replace & first:

def escape_html(s):
    s = s.replace('&', "&amp;")
    s = s.replace(">", "&gt;")
    s = s.replace("<", "&lt;")
    s = s.replace('"', "&quot;")
    return s

because otherwise you are replacing the & in each of the other replacements you made. This has nothing to do with Python raw string literals; that only disables \-style escapes.

You could also just use the cgi.escape() function; set the second argument to True to have it escape quotes.

Demo:

>>> def escape_html(s):
...     s = s.replace('&', "&amp;")
...     s = s.replace(">", "&gt;")
...     s = s.replace("<", "&lt;")
...     s = s.replace('"', "&quot;")
...     return s
... 
>>> escape_html('<script>alert("Oops & bummer!")</script>')
'&lt;script&gt;alert(&quot;Oops &amp; bummer!&quot;)&lt;/script&gt;'
>>> import cgi
>>> cgi.escape('<script>alert("Oops & bummer!")</script>', True)
'&lt;script&gt;alert(&quot;Oops &amp; bummer!&quot;)&lt;/script&gt;'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • @Michael See an old [post](http://stackoverflow.com/questions/28775049/most-efficient-way-to-replace-multiple-characters-in-a-string/28775426#28775426) of mine for more on the matter. – Malik Brahimi Mar 12 '15 at 20:40