0

I have the following html string:

<!doctype html>
<html>
    <head>
      <meta charset="utf-8">
      <meta http-equiv="X-UA-Compatible" content="IE=edge">
      <meta name="viewport" content="width=device-width, initial-scale=1">
      <title>Demo Website</title>
    </head>
    <body class="home">
      <div class="container-fluid">
        <h1 class="subtitle">Subtitle</h1>
        <h1 class="title">title</h1>
        <p>paragraph...</p>
      </div>
    </body>
</html>

I need all possible <tag>...</tag> irrespective of their nesting level. So the output should be like:

<html> ... </html>
<head> ... </head>
<title> ... </title>
<body class="home"> ... </body>
<div class="container-fluid"> ... </div>
<h1 class="subtitle"> ... </h1>
<h1 class="title"> ... </h1>
<p> ... </p>

I have been trying to match it using the following pattern (with ignore-case and single-line options):

<([\w_]+?)\b[^>]*>(.*?)</\1>

But all I get is only the first <html>...</html> since it nests the entire string.

How do I go about it?

Pradeep Kumar
  • 6,836
  • 4
  • 21
  • 47
  • 1
    See [*RegEx match open tags except XHTML self-contained tags*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Use an HTML parser, like HtmlAgilityPack to easily parse HTML in .NET. – Wiktor Stribiżew Mar 25 '16 at 20:07
  • In the above link, and also in HtmlAgilityPack, how do I know/ignore tags that don't have an explicit closing tag. e.g. I don't want `` in the result. – Pradeep Kumar Mar 25 '16 at 20:12
  • @PradeepKumar Try something with HtmlAgilityPack and then ask where you get stuck. – Eser Mar 25 '16 at 20:24
  • 1
    Obligatory link: [You can't parse (X)HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Lasse V. Karlsen Mar 25 '16 at 20:33
  • These are some fixed text files that won't change. I know I can use html DOM like loading in a webbrowser or use HtmlAgilityPack etc. And there are a lot of ways doing a loop etc. But my question was specific to regex whether such type of nesting/recursion is possible via RegEx. I'm trying to learn balancing groups in regex. – Pradeep Kumar Mar 28 '16 at 07:26

0 Answers0