Regular expression HTML tag javascript

Question

I want to verify if the code that enter is a HTML code ( is necessary to start with <html> and end with </html> )

I try to do this

var reghtml = new RegExp("(<html>*\n+</html>)");

but I have a problem is necessary to make a \n in the code, I need to verify the first and end tag ( = <html> and </html> ) and if he make something between them is necessary to start with < and end with >

is there any solution ?

Sorry, *and if he make something between them is necessary to start with `<` and end with `>`* is rather unclear. — Wiktor Stribiżew, Nov 26 '16 at 22:32
@WiktorStribiżew if he make like this `` it's correct, but if he want to make something between the tag he need to start with `<` and end with `>`, for example ` test ` => error | ` ` => correct — saadsaad, Nov 26 '16 at 22:37
Something like `/^(?:\s*<[^>]*>)*<\/html>$/.test(your_html)`? — Wiktor Stribiżew, Nov 26 '16 at 22:43
Have you looked into validation without regular expressions? Regex and HTML don't mix very well — Dbz, Nov 26 '16 at 22:57
@WiktorStribiżew yes it's work, but there's a small mistake, if I Back to line and I write a correct code it's give me **error** — saadsaad, Nov 26 '16 at 23:22
I do not understand *if I Back to line and I write a correct code it's give me error*. Please provide some valid and invalid inputs. — Wiktor Stribiżew, Nov 27 '16 at 08:28
@WiktorStribiżew if I write like this `` it's work correctly (this's what I want) but if I make like this `` and I enter and write `` and enter for second time `` it's give me **error** — saadsaad, Nov 27 '16 at 11:35
I [cannot repro](https://jsfiddle.net/h0r2tb1n/), please provide a js fiddle to show the issue. — Wiktor Stribiżew, Nov 27 '16 at 11:47

score 2 · Answer 1 · edited May 23 '17 at 10:30

You shouldn't use regular-expressions to validate HTML (let alone parse it) because HTML is not a "Regular Language".

So here's an example of a false-negative case which would cause any regular expression you could write to attempt to validate HTML to mark it as invalid:

<html>
<head>
    <!-- </html> -->
</head>
<body>
    <p>This is valid HTML</p>
</body>
</html>

And because you can nest comments in HTML (and SGML and XML) you can't write a straightforward regex for this particular case either:

<html>
<head>
    <!-- <!-- <!-- <!-- </html> -->
</head>
<body>
    <p>This is valid HTML</p>
</body>
</html>

And here's a false-positive (assuming you don't use the ^$ regex anchors):

<p>illegal element</p>
<html>
    <img>illegal text node</img>
</html>
<p>another illegal element</p>

Granted, there are more powerful implementations of of regular-expressions that add rudiminary support for things like counting-depth, but then you're in for a world of hurt.

The correct way to validate HTML is to use a HTML DOM library. In .NET this is HtmlAgilityPack. In browser-based JavaScript it's even simpler: just use the browser's built-in parser (innerHTML):

(stolen from Check if HTML snippet is valid with Javascript )

function isValidHtml(html) {
    var doc = document.implementation.createHTMLDocuiment("");
    doc.documentElement.innerHTML = html;
    return ( doc.documentElement.innerHTML === html );
}

score 1 · Answer 2 · answered Nov 26 '16 at 22:52

Here a pattern for you. It checks if the first level has a valid opening and closing tag. The first level has to have closing tags, you can't do <html><img /></html>, for that you can remove the whole closing tag checking pattern part.

var validHtml = '\
<html itemscope>\
 <head></head>\
 <body style="background: red;">\
  Everything is fine\
 </body>\
</html>\
',
 invalidHtml = '\
<html itemscope>\
 <head></foot>\
 <body>\
  Nothing is fine\
 </body>\
</html>\
',
 pattern = /^\s*<html(?:\s[^>]*)?>(?:\s*<(\w+)(?:\s[^>]+)?>(?:.|\s)*<\/\1>\s*)*<\/html>\s*$/i;
 
console.log(pattern.test(validHtml) ? 'valid' : 'invalid');
console.log(pattern.test(invalidHtml) ? 'valid' : 'invalid');

Regular expression HTML tag javascript

2 Answers2