1

Given the following HTML snippet:

<header>Student Directory</header>
<main>
<Student name="Pedro" age="23" />
    <div id="student1">
        <ul>
            <li>Maths</li>
            <li>English</li>
            <li>Swedish</li>
        </ul>
</div>
<Student name="Jane" age="15" />
</main>
<Footer />

In the above html snippet there are three custom tags that represent components. Components have a consistent format, in that they start with an opening bracket then a capital letter and then they are close with /> So, I am trying to obtain all of the components as strings. Regex seems the correct approach, however I am new to RegEx and I have read about 'greedy' and 'none greedy' approaches to achieving this. However, as a novice, I may miss best practices or do things inefficiently. Essentially, in the HTML example I am looking to obtain three strings:

The strings:

 <Student name="Pedro" age="23" /> <Student name="Jane" age="15" /> and  <Footer />

Which represent the only three components and their data within the HTML. Any assistance would be greatly appreciated.

Martyn Wynn
  • 137
  • 1
  • 9
  • 1
    You should not use Regex for parsing XML. More informations [provided by Tony the Pony](https://stackoverflow.com/a/1732454/3670132) – Seblor Jan 19 '19 at 10:28
  • How are you referencing the snippet? And what is your broader use case? Why do you want to reference them as strings? – James Hibbard Jan 19 '19 at 10:28
  • there's a famous quote on regexes that fits this question _Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems_ – maioman Jan 19 '19 at 10:28
  • My use case is that I have a string that contains markup and components, in the format specified, so querying the DOM is not possible as the markup has not been applied to the DOM. So, I am looking to extract the components based on a consistent pattern that I have defined. So just consider it as a string, with substrings that need to be pulled out. – Martyn Wynn Jan 19 '19 at 10:34

1 Answers1

1

If you really want to do this with regex you could try with something like <[^(\/>)]+\/> :

var str = `<header>Student Directory</header>
<main>
<Student name="Pedro" age="23" />
    <div id="student1">
        <ul>
            <li>Maths</li>
            <li>English</li>
            <li>Swedish</li>
        </ul>
</div>
<Student name="Jane" age="15" />
</main>
<Footer />`

var matches = str.match(/<[^(\/>)]+\/>/g)

console.log(matches)
maioman
  • 18,154
  • 4
  • 36
  • 42
  • Not only does this answer the question perfectly and yield the desired result. The quote previously offered was so true. Thank you so much for both the quote and the answer. – Martyn Wynn Jan 19 '19 at 10:48
  • Good to know that solves your problem. Sometimes regexes are just the most convenient tool :) – maioman Jan 19 '19 at 10:51