Since you've considered replacing all <
and >
by the HTML entities, a good option would consist of sending the Content-Type: text/plain
header.
If you include want to show the contents of the file, replacing every &
by &
and every <
by <
is sufficient to correctly display the contents of the file. Example:
Input: Huge wall of text 1<a2 &>1
Output: Huge wall of text 1<a2 &>1
Unmodified output, displaying in browser: Huge wall of text 11
(<..>
interpreted as HTML)
If you cannot modify code at the back-end (server-side), you need a HTML parser, which sanitised your code. JavaScript is not the only threat, embedded content (<object>
, <iframe>
, ...) can also be very malicious. Have a look at the following answer for a very detailed HTML parser & sanitizer :
Can I load an entire HTML document into a document fragment in Internet Explorer?