0

I have a very long string that is made by few HTML documents jammed together like this:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info 
</head>
<body>
<div > some content with other HTML tags that I want to preserve </div>
<body>
</html>
<html>
<div> another content with other HTML tags that I want to preserve </div>
</html>
<html xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info 
</head>
<body>
<div> some other content with other HTML tags that I want to preserve </div>
<body>
</html>

and I would like to turn them into something like this:

<div > some content with other HTML tags that I want to preserve </div>
<div> another content with other HTML tags that I want to preserve </div>
<div> some other content with other HTML tags that I want to preserve </div>

Basically Im looking for a Regex to remove just the <html> </html> tags (not the other/inner html elements) from a huge html string. Please note that I should preserve the html content and just get rid of the parent tags.

Thanks in advance

(Please note that I have done an extensive search to make sure this is not a duplicate question)

AleX_
  • 508
  • 1
  • 6
  • 20

1 Answers1

0

As an important note: https://stackoverflow.com/a/1732454/3498950

But if you must, I might use something like /<\/?html.*?>/g

const html = `<html xmlns:v="urn:schemas-microsoft-com:vml">
<head>head info</head>
<div>other content</div>
</html>`;

console.log(html.replace(/<\/?html.*?>/g, '').trim());

And for tweaking the regex: https://regex101.com/r/EeTv68/1

spencer.sm
  • 19,173
  • 10
  • 77
  • 88