-1

I'm parsing a XML file with nodejs and RegExp, but i don't find the way to extract all children from a parent, for example i need all FormalName="(.+)" from parent PARENT1

<TopicSet FormalName="PARENT1">
    <Topic>
      <TopicType FormalName="Child1" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child2" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child3" />
    </Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
    <Topic>
      <TopicType FormalName="Child1" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child2" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child3" />
    </Topic>
</TopicSet>

I tried this :

<TopicSet FormalName="PARENT1">(?:(?:\s|\S)*?)TopicType FormalName="(.+)"(?:(?:\s|\S)*?)<\/TopicSet>

But it only returns the first occurence (Child1) of PARENT1, and not Child1, Child2 and Child3

https://regex101.com/r/3ESH29/2/

Emma
  • 27,428
  • 11
  • 44
  • 69
visconti
  • 45
  • 1
  • 5

2 Answers2

3

It is not advisable to parse xml with a regex.

Instead of using a regex, you might use a DOMParser and for example use querySelectorAll to get the values of FormalName in PARENT1:

Example using jsdom

let xml = `<TopicSet FormalName="PARENT1">
    <Topic>
      <TopicType FormalName="Child1" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child2" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child3" />
    </Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
    <Topic>
      <TopicType FormalName="Child1" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child2" />
    </Topic>
    <Topic>
      <TopicType FormalName="Child3" />
    </Topic>
</TopicSet>`;

let parser = new DOMParser();
let doc = parser.parseFromString(xml, "text/xml");
let res = doc.querySelectorAll("TopicSet[FormalName='PARENT1'] Topic TopicType");
res.forEach(e => console.log(e.getAttribute("FormalName")));
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

It may not be the best idea to do that with regular expressions. However, if you have to, you might want to create three capturing groups with parent open/close tags as left/right boundaries and swipe everything in between:

(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)

enter image description here

RegEx

If this wasn't your desired expression, you can modify/change your expressions in regex101.com.

RegEx Circuit

You can also visualize your expressions in jex.im:

enter image description here

JavaScript Demo

const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg;
const str = `<TopicSet FormalName="PARENT1">
 <Topic>
   <TopicType FormalName="Child1" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child2" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child3" />
 </Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
 <Topic>
   <TopicType FormalName="Child1" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child2" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child3" />
 </Topic>
</TopicSet>`;
const subst = `$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

JavaScript Demo 2

If you wish to also print the parent tag, you can simply replace it with $1$2$3 instead of $2, which here we have added to be just simple to call:

const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg;
const str = `<TopicSet FormalName="PARENT1">
 <Topic>
   <TopicType FormalName="Child1" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child2" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child3" />
 </Topic>
</TopicSet>
<TopicSet FormalName="PARENT2">
 <Topic>
   <TopicType FormalName="Child1" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child2" />
 </Topic>
 <Topic>
   <TopicType FormalName="Child3" />
 </Topic>
</TopicSet>`;
const subst = `$1$2$3`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Demo


If you only want to extract the first parent, you can add another boundary:

(<TopicSet FormalName="PARENT1">)([\s\S]*?)(<\/TopicSet>)

Demo

Emma
  • 27,428
  • 11
  • 44
  • 69