0

div#html,div#css,div#js,div#run{
    border:1px solid red;
    height:80px;
    width:80px;
    float:left;
}
div#content{
    clear:both;
    width:400px;
    height:200px;
    border:1px solid black;
}
textarea{
    overflow:auto;
}
<div id='html'>html</div>
<div id='css'>css</div>
<div id='js'>js</div>
<div id='run'>run</div>
<div id='content'>
</div> 

Now i want to get only the html content.

var content=document.body.innerHTML; alert(content)

The alert webpage will show html content adding my js code

enter image description here

How can get only html content containing no js code?

Why can't get it with str.replace?

var content=document.body.innerHTML;
var reg = new RegExp('<script type="text/javascript">.+</script>');
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);

The whole html file.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title></title>
    <style type='text/css'>
    div#html,div#css,div#js,div#run{
        border:1px solid red;
        height:80px;
        width:80px;
        float:left;
    }
    div#content{
        clear:both;
        width:400px;
        height:200px;
        border:1px solid black;
    }
    textarea{
        overflow:auto;
    }
   </style>
</head>
<body>
    <div id='html'>html</div>
    <div id='css'>css</div>
    <div id='js'>js</div>
    <div id='run'>run</div>
    <div id='content'>
    </div>        
    <script type='text/javascript'>
    var content=document.body.innerHTML;
    var reg = new RegExp("<script type='text/javascript'>.+</script>");
    var onlyHtml = content.replace(reg,"");
    alert(onlyHtml);
    </script>    
</body>
</html>

Why can't extract only html with regular expression?
Verify my regular expression:

var content = "<p>test</p><script type='text/javascript'>somany lines and \
              so many lines</script>"
var reg = new RegExp("<script type='text/javascript'>.+</script>");
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);

It gets:

<p>test</p>
showkey
  • 482
  • 42
  • 140
  • 295

3 Answers3

0

innerText will return text data only. So you can get the text data using document.body.innerText.

const content = document.body.innerText;
alert(content);
div#html,div#css,div#js,div#run{
    border:1px solid red;
    height:80px;
    width:80px;
    float:left;
}
div#content{
    clear:both;
    width:400px;
    height:200px;
    border:1px solid black;
}
textarea{
    overflow:auto;
}
<div id='html'>html</div>
<div id='css'>css</div>
<div id='js'>js</div>
<div id='run'>run</div>
<div id='content'>
</div>
Derek Wang
  • 10,098
  • 4
  • 18
  • 39
0

Clone the element and remove the script tags and whatever else you want.

document.querySelector("button.show").addEventListener("click", function () {
  var bodyClone = document.querySelector('body').cloneNode(true);
  bodyClone.querySelectorAll('script, button.show').forEach(elem => elem.remove());
  console.log(bodyClone.innerHTML);  
});
<div id="wrapperElem">
  <div id='html'>html</div>
  <div id='css'>css</div>
  <div id='js'>js</div>
  <div id='run'>run</div>
  <div id='content'>content</div>
  <button class="show">Show</button>
</div>
epascarello
  • 204,599
  • 20
  • 195
  • 236
0
var content=document.body.innerHTML;
var reg = new RegExp('<script type="text/javascript">[^]+</script>');
var onlyHtml = content.replace(reg,"");
alert(onlyHtml);
showkey
  • 482
  • 42
  • 140
  • 295