How to get only html content containing no js code with document.body.innerHTML?

Question

div#html,div#css,div#js,div#run{
    border:1px solid red;
    height:80px;
    width:80px;
    float:left;
}
div#content{
    clear:both;
    width:400px;
    height:200px;
    border:1px solid black;
}
textarea{
    overflow:auto;
}

<div id='html'>html</div>
<div id='css'>css</div>
<div id='js'>js</div>
<div id='run'>run</div>
<div id='content'>
</div>

Now i want to get only the html content.

var content=document.body.innerHTML; alert(content)

The alert webpage will show html content adding my js code

How can get only html content containing no js code?

Why can't get it with str.replace?

var content=document.body.innerHTML;
var reg = new RegExp('<script type="text/javascript">.+</script>');
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);

The whole html file.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title></title>
    <style type='text/css'>
    div#html,div#css,div#js,div#run{
        border:1px solid red;
        height:80px;
        width:80px;
        float:left;
    }
    div#content{
        clear:both;
        width:400px;
        height:200px;
        border:1px solid black;
    }
    textarea{
        overflow:auto;
    }
   </style>
</head>
<body>
    <div id='html'>html</div>
    <div id='css'>css</div>
    <div id='js'>js</div>
    <div id='run'>run</div>
    <div id='content'>
    </div>        
    <script type='text/javascript'>
    var content=document.body.innerHTML;
    var reg = new RegExp("<script type='text/javascript'>.+</script>");
    var onlyHtml = content.replace(reg,"");
    alert(onlyHtml);
    </script>    
</body>
</html>

Why can't extract only html with regular expression?
Verify my regular expression:

var content = "<p>test</p><script type='text/javascript'>somany lines and \
              so many lines</script>"
var reg = new RegExp("<script type='text/javascript'>.+</script>");
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);

It gets:

<p>test</p>

score 0 · Answer 1 · answered Sep 30 '20 at 01:50

innerText will return text data only. So you can get the text data using document.body.innerText.

const content = document.body.innerText;
alert(content);

div#html,div#css,div#js,div#run{
    border:1px solid red;
    height:80px;
    width:80px;
    float:left;
}
div#content{
    clear:both;
    width:400px;
    height:200px;
    border:1px solid black;
}
textarea{
    overflow:auto;
}

<div id='html'>html</div>
<div id='css'>css</div>
<div id='js'>js</div>
<div id='run'>run</div>
<div id='content'>
</div>

score 0 · Answer 2 · answered Sep 30 '20 at 02:22

0

Clone the element and remove the script tags and whatever else you want.

document.querySelector("button.show").addEventListener("click", function () {
  var bodyClone = document.querySelector('body').cloneNode(true);
  bodyClone.querySelectorAll('script, button.show').forEach(elem => elem.remove());
  console.log(bodyClone.innerHTML);  
});

<div id="wrapperElem">
  <div id='html'>html</div>
  <div id='css'>css</div>
  <div id='js'>js</div>
  <div id='run'>run</div>
  <div id='content'>content</div>
  <button class="show">Show</button>
</div>

answered Sep 30 '20 at 02:22

epascarello

204,599
20
195
236

Please do not revise my html structure and add new element in it. – showkey Sep 30 '20 at 02:24
Why can't extract only html with regular expression? – showkey Sep 30 '20 at 02:25
@showkey I added more than one and the elements makes ZERO difference. – epascarello Sep 30 '20 at 02:26
Why no reg exp? Look at this: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – epascarello Sep 30 '20 at 02:28
Why to replace js code with regular expression can't work? – showkey Sep 30 '20 at 02:28
There is no need for reg exp..... Feel free to try.... – epascarello Sep 30 '20 at 02:28

score 0 · Answer 3 · answered Sep 30 '20 at 08:23

0

var content=document.body.innerHTML;
var reg = new RegExp('<script type="text/javascript">[^]+</script>');
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);

answered Sep 30 '20 at 08:23

showkey

482
42
140
295

How to get only html content containing no js code with document.body.innerHTML?

3 Answers3