Get the actual XML tag using RegExp

Question

I have an XML which looks something like:

<drawing><some other tags><Picture><some other tags></drawing><drawing><some other tags><Chart><some other tags></drawing>

And I want to extract

<drawing><some other tags><Chart><some other tags></drawing>

Currently I am using this RegExp:

/<drawing>.*?<Chart>.*?</drawing>/g

However, it is returning me the whole XML, since it is also valid. But I want only the second occurrence, and unable to arrive at the solution. Thanks in advance.

[Are you parsing XML with Regex?!?!](http://stackoverflow.com/a/1732454/3622940) — Unihedron, Jul 29 '14 at 12:22
On a slightly more constructive note (hopefully), which flavour is your regex engine? — Unihedron, Jul 29 '14 at 12:33

score 1 · Accepted Answer · answered Jul 29 '14 at 12:23

1

With all the disclaimers about using regex to parse xml, if you want a regex solution, use this:

<drawing>(?:(?!</drawing>).)*?<Chart>.*?</drawing>

See the match in the Regex Demo.

Explanation

<drawing> matches literal chars
(?:(?!</drawing>).) matches one character that does not start </drawing>
*? repeats this match lazily up till...
<Chart> matches literal chars
.*? lazily matches chars up till...
</drawing>

answered Jul 29 '14 at 12:23

zx81

41,100
9
89
105

Thanks, it worked! I tried non-capturing and negative lookahead, but not together. – yoogeeks Jul 29 '14 at 12:35
But be aware that it will sometimes fail, e.g, if certain character strings appear in comments or CDATA sections, or if the document uses whitespace in places where it is permitted but where you haven't allowed for it – Michael Kay Jul 29 '14 at 15:00

Get the actual XML tag using RegExp

1 Answers1