1

is there a way of getting all the content of the page HTML , CSS , but exclude all the java script functions and script src?

        var htmlPage = $("html").html();
    console.log(htmlPage);

I know that will give me all of it. but I need to exclude the JS from the results

benji_r
  • 535
  • 2
  • 16
  • 34
  • You can try this http://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression – Anton Dmitrievich Feb 16 '17 at 18:48
  • Possible duplicate of [Removing all script tags from html with JS Regular Expression](http://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression) – Heretic Monkey Feb 16 '17 at 18:52
  • Ok, I had to set the regex to non-greedy because it removed way too much HTML. Now it works. – pid Feb 16 '17 at 18:56

2 Answers2

2

EDIT: fixed the regex (non-greedy version)

You can try this:

var htmlPage = $("html").html().replace(/<script[\s\S]*?<\/script>/mig, "");

The regular expression should match all <script> ... </script> tags and replace them with nothing.


BTW this is kind of a lucky shot because the regex itself requires the ending </script> to be escaped with a \ backslash like this: <\/script>.

This escape character is why the regex doesn't match itself, which would cause it to fail. So, it works because by escaping it correctly it isn't self-similar anymore.

pid
  • 11,472
  • 6
  • 34
  • 63
0

Another option is to use Element.innerHTML and include the content that you want to get. For example:

<!doctype html>
<html>
  <head>
   <!--Css links goes here--> 
  </head>
  <body>
   <!--Your content-->
    <p>Hello World</p> 
  </body>
 <script>
 //Js
 </script>
<html>
var body = document.body.innerHTML;
var head = document.head.innerHTML;

Then you can concadenate or whatever you want.

Cesar Jr Rodriguez
  • 1,691
  • 4
  • 22
  • 35
  • 1
    I wouldn't recommend putting anything outside `` and `` tags. The HTML5 standard doesn't allow that and it breaks the page on some browsers. It also is a bad practice to write non-validating HTML in 2017. For more look here: http://stackoverflow.com/questions/3037725/is-it-wrong-to-place-the-script-tag-after-the-body-tag. I don't downvote because this comment is enough of a correction. Have a nice day, sir. – pid Feb 16 '17 at 19:06
  • You're completely right. Thank you for the information. – Cesar Jr Rodriguez Feb 16 '17 at 19:19