2

I have an XML which looks something like:

<drawing><some other tags><Picture><some other tags></drawing><drawing><some other tags><Chart><some other tags></drawing>

And I want to extract

<drawing><some other tags><Chart><some other tags></drawing>

Currently I am using this RegExp:

/<drawing>.*?<Chart>.*?</drawing>/g 

However, it is returning me the whole XML, since it is also valid. But I want only the second occurrence, and unable to arrive at the solution. Thanks in advance.

yoogeeks
  • 965
  • 8
  • 24

1 Answers1

1

With all the disclaimers about using regex to parse xml, if you want a regex solution, use this:

<drawing>(?:(?!</drawing>).)*?<Chart>.*?</drawing>

See the match in the Regex Demo.

Explanation

  • <drawing> matches literal chars
  • (?:(?!</drawing>).) matches one character that does not start </drawing>
  • *? repeats this match lazily up till...
  • <Chart> matches literal chars
  • .*? lazily matches chars up till...
  • </drawing>
zx81
  • 41,100
  • 9
  • 89
  • 105
  • Thanks, it worked! I tried non-capturing and negative lookahead, but not together. – yoogeeks Jul 29 '14 at 12:35
  • But be aware that it will sometimes fail, e.g, if certain character strings appear in comments or CDATA sections, or if the document uses whitespace in places where it is permitted but where you haven't allowed for it – Michael Kay Jul 29 '14 at 15:00