Given the following HTML snippet:
<header>Student Directory</header>
<main>
<Student name="Pedro" age="23" />
<div id="student1">
<ul>
<li>Maths</li>
<li>English</li>
<li>Swedish</li>
</ul>
</div>
<Student name="Jane" age="15" />
</main>
<Footer />
In the above html snippet there are three custom tags that represent components. Components have a consistent format, in that they start with an opening bracket then a capital letter and then they are close with /> So, I am trying to obtain all of the components as strings. Regex seems the correct approach, however I am new to RegEx and I have read about 'greedy' and 'none greedy' approaches to achieving this. However, as a novice, I may miss best practices or do things inefficiently. Essentially, in the HTML example I am looking to obtain three strings:
The strings:
<Student name="Pedro" age="23" /> <Student name="Jane" age="15" /> and <Footer />
Which represent the only three components and their data within the HTML. Any assistance would be greatly appreciated.