Java RegExp: Finding the correct regular expression

Question

I am struggling with finding the correct regular expression for extracting the strings according to the following criteria:

I have an xml fragment with multiple tags. Each element starts with <ABC_xxxx> and ends with </ABC_xxxx>

The xxxx changes for each element. For example:

 <ABC_A1S1>1234</ABC_A1S1>
 <ABC_uw3ey>1234</ABC_uw3ey>
 <ABC_PD4frfr5>1234</ABC_PD4frfr5>

etc...

The number of x is not fixed!

I want to extract each element, including the tags themselves.

How can I do that?

Also - [obligatory link](http://stackoverflow.com/a/1732454/2071828). Don't do it. Use XPath. — Boris the Spider, Sep 28 '16 at 10:47

Sina Iravanian · Answer 1 · 2016-09-28T11:17:28.473

Assuming that there will be no such elements nested inside each other, try this:

\<ABC(\w+)\>[^\<]+\<\/ABC(\1)\>

Explanation:

\<ABC(\w+)\> is the opening tag that starts with ABC the letters after ABC are captured in a group (hence parentheses). We need them later
[^\<]+ is the body of the element which is any character except opening angle bracket
<\/ABC(\1)\> is the closing element that starts with ABC and must follow with the exact letters after ABC in the opening tag. \1 is a reference to the first captured group.

Important Note: XML is not a regular language, therefore Regular Expressions are not capable to parse it. E.g., imagine 2 or more such elements nested inside each other. Use an XML parser to parse XML.

score 0 · Answer 2 · edited Nov 03 '21 at 05:38

0

<ABC_([^>]*)>([^<]*)<\/ABC_([^>]*)>

edited Nov 03 '21 at 05:38

Nimantha

6,405
6
28
69

answered Sep 28 '16 at 10:56

Mustofa Rizwan

10,215
2
28
43

Java RegExp: Finding the correct regular expression

2 Answers2