I have the following html string:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Demo Website</title>
</head>
<body class="home">
<div class="container-fluid">
<h1 class="subtitle">Subtitle</h1>
<h1 class="title">title</h1>
<p>paragraph...</p>
</div>
</body>
</html>
I need all possible <tag>...</tag>
irrespective of their nesting level.
So the output should be like:
<html> ... </html>
<head> ... </head>
<title> ... </title>
<body class="home"> ... </body>
<div class="container-fluid"> ... </div>
<h1 class="subtitle"> ... </h1>
<h1 class="title"> ... </h1>
<p> ... </p>
I have been trying to match it using the following pattern (with ignore-case and single-line options):
<([\w_]+?)\b[^>]*>(.*?)</\1>
But all I get is only the first <html>...</html>
since it nests the entire string.
How do I go about it?