-3

I am struggling with finding the correct regular expression for extracting the strings according to the following criteria:

I have an xml fragment with multiple tags. Each element starts with <ABC_xxxx> and ends with </ABC_xxxx>

The xxxx changes for each element. For example:

 <ABC_A1S1>1234</ABC_A1S1>
 <ABC_uw3ey>1234</ABC_uw3ey>
 <ABC_PD4frfr5>1234</ABC_PD4frfr5>

etc...

The number of x is not fixed!

I want to extract each element, including the tags themselves.

How can I do that?

Nimantha
  • 6,405
  • 6
  • 28
  • 69
dushkin
  • 1,939
  • 3
  • 37
  • 82

2 Answers2

1

Assuming that there will be no such elements nested inside each other, try this:

\<ABC(\w+)\>[^\<]+\<\/ABC(\1)\>

Explanation:

  • \<ABC(\w+)\> is the opening tag that starts with ABC the letters after ABC are captured in a group (hence parentheses). We need them later
  • [^\<]+ is the body of the element which is any character except opening angle bracket
  • <\/ABC(\1)\> is the closing element that starts with ABC and must follow with the exact letters after ABC in the opening tag. \1 is a reference to the first captured group.

Important Note: XML is not a regular language, therefore Regular Expressions are not capable to parse it. E.g., imagine 2 or more such elements nested inside each other. Use an XML parser to parse XML.

Sina Iravanian
  • 16,011
  • 4
  • 34
  • 45
0
<ABC_([^>]*)>([^<]*)<\/ABC_([^>]*)>
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43