I have been working hard to get a regular expression to work for me, but I'm stuck on the last part. My goal is to remove an xml element when it is contained within specific parent elements. The example xml looks like so:
<ac:image ac:width="500">
<ri:attachment ri:filename="image2013-10-31 11:21:16.png">
<ri:page ri:content-title="Banana Farts" /> /* REMOVE THIS */
</ri:attachment>
</ac:image>
The expression I have written is:
(<ac:image.*?>)(<ri:attachment.*?)(<ri:page.*? />)(</ri:attachment></ac:image>)
In more readable format, I am searching on four groups
(<ac:image.*?>) //Find open image tag
(<ri:attachment.*?) //Find open attachment tag
(<ri:page.*? />) //Find the page tag
(</ri:attachment></ac:image>) //Find close image and attachment tags
And this basically works because I can remove the page element in notepad++ with:
/1/2/4
My issue is that the search is too greedy. In an example like below it grabs everything from start to finish, when really only the second image tag is a valid find.
<ac:image ac:width="500">
<ri:attachment ri:filename="image2013-10-31 11:21:16.png" />
</ac:image>
<ac:image ac:width="500">
<ri:attachment ri:filename="image2013-10-31 11:21:16.png">
<ri:page ri:content-title="Employee Portal Editor" />
</ri:attachment>
</ac:image>
Can anyone help me finish this up? I thought all I had to do was add ?
to make the closing tag group not greedy, but it failed to work.