6

I compose a large HTML file out of a huge unformatted text file. Now my fear is that the text file might contain some malicious javascript code. To avoid any damage I scan the text and replace any < or > with lt and gt. That is quite effective, but it's not really good for the performance.

Is there some tag or attribute or whatever that allows me to turn javascript off within the HTML file? In the header perhaps?

Fotis MC
  • 323
  • 1
  • 2
  • 12
  • 2
    Where do the HTML come from? And how do you take it? You should tell us more so that we could help because there is probably some better solutions when you *input* the HTML code – JMax Oct 28 '11 at 10:46
  • I am creating the HTML myself. Actually it's a big table whose columns are filled with the data I extract from a text file. Therefore I do have control over the basic HTML file, just not what is within the columns. – Fotis MC Oct 28 '11 at 11:46

5 Answers5

4

Since you've considered replacing all < and > by the HTML entities, a good option would consist of sending the Content-Type: text/plain header.

If you include want to show the contents of the file, replacing every & by &amp; and every < by &lt; is sufficient to correctly display the contents of the file. Example:
Input: Huge wall of text 1<a2 &>1
Output: Huge wall of text 1&lt;a2 &amp;>1
Unmodified output, displaying in browser: Huge wall of text 11 (<..> interpreted as HTML)

If you cannot modify code at the back-end (server-side), you need a HTML parser, which sanitised your code. JavaScript is not the only threat, embedded content (<object>, <iframe>, ...) can also be very malicious. Have a look at the following answer for a very detailed HTML parser & sanitizer :
Can I load an entire HTML document into a document fragment in Internet Explorer?

Community
  • 1
  • 1
Rob W
  • 341,306
  • 83
  • 791
  • 678
3

When you have a control of backend, you can provide file with header

Content-type: text/plain;
Marian Bazalik
  • 1,351
  • 1
  • 13
  • 30
1

No, you can't disable JavaScript from inside a webpage, rather, you should sanitize any and all input from your users to make sure no malicious scripts go through your script.

Whether it's by remove all script tags or replacing < and >, you need to make sure your input is clean.

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
1

Do a search for <script and replace with <!--<script and search for </script> and replace with </script>-->.

This should comment out all scripts in the file.

Ayush
  • 41,754
  • 51
  • 164
  • 239
  • This is far from a complete solution - see https://owasp.org/www-community/xss-filter-evasion-cheatsheet for some of the ways that people could evade this. – SomeoneElse Jun 01 '21 at 13:41
0

you need a sandbox or clean html code. look phpids or html purifier.

e-info128
  • 3,727
  • 10
  • 40
  • 57