I wanted to encode a string to its corresponding html entities but unfortunately I am not able to. As I said in question title, I want all characters in a string to be converted into their corresponding html entity(both numbers and names). So according to the documentation. I tried:
In [31]: import html
In [32]: s = '<img src=x onerror="javascript:alert("XSS")">'
In [33]: html.escape(s)
Out[33]: '<img src=x onerror="javascript:alert("XSS")">'
But I want all characters to be converted and not just '<' , '>', '&' ,etc.
And also html.escape
only gives html entity names and not numbers but I want both.
But surprisingly html.unescape
unescapes all entities into their corresponding characters.
In [34]: a = '<img src=x onerror="javascript
...: 8alert('XSS')">'
In [35]: html.unescape(a)
Out[35]: '<img src=x onerror="javascript:alert(\'XSS\')">'
So can I do the same with html.escape
?
I am really surprised why all resources on internet for encoding and decoding html entities are not encoding all chars and also the php htmlspecialchars()
function don't do that. And I don't want to write all html entity numbers from here character by character.