I want to know the input html string is vaild or not. I researched various HTML parser. But anything doesn't have validating html method. Jsoup is almost same what I want. But it generates valid parsed html. Basically I want to check valid html structure as below.
<html>
<head>~</head>
<body>~</body>
</html>
So, I wrote code in Java.
String html = "<html><head><title>asdf</title></Head><body>asfd</body></html>";
String compile = "(?i)<html.*>.*<head>.*?</head>.*<body>.*</body>.*</html>";
Pattern pattern = Pattern.compile(compile);
Matcher matcher = pattern.matcher(html);
if (matcher.matches()) {
System.out.println("Valid html");
} else {
System.out.println("Invalid html");
}
But if html has 2 of <head> element, it also checks valid html. How to check valid html structure efficiently?