Notepad++ Regex to find group of lines with condition

Question

Given this example text:

<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ABB</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="FM"/>
...lines...
...........
</abr:rules>
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ADE</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="CM"/>
...lines...
...........
</abr:rules> (end of group)

I would like to find and remove all that goes from <abr:rules> to </abr:rules> with the condition that subapplication IS NOT "CM". Organization and application are the same, <abr:code> it's any string.

What I tried so far is

<abr:rules>\n<abr:ruleTypeDefinition>\n<abr:code>[a-zA-Z0-9]{3,}<\/abr:code>\n<abr:ownership>\n<.*"(FM|PSD|SSC)"\/>\n(?s).*?\n<\/abr:rules>\n

which works but only because I know the other subapplication names.

Is there any way to do it with Regex only ?

Why don't you use Python and an XPath query (for example) instead of an editor that isn't designed for your task? — Casimir et Hippolyte, Apr 13 '18 at 20:53

Tim Biegeleisen · Accepted Answer · 2018-04-13T15:46:56.733

2

Try the following find and replace:

Find:

<abr:rules>((?!subapplication=).)*subapplication="(?!CM")[^"]+"((?!</abr:rules>).)*</abr:rules>

Replace:

(empty string)

Demo

Note: The above pattern will only work if you enable dot in Notepad++ to match newlines. If you don't want to do that, then you may use [\S\s] instead of dot.

edited Apr 13 '18 at 15:46

answered Apr 13 '18 at 15:05

Tim Biegeleisen

502,043
27
286
360

I like your solution but it's matching 'CM' groups as well. I will post a bigger sample for testing – farbiondriven Apr 13 '18 at 15:14
@farbiondriven I fixed it by using a tempered dot which can't jump over the ending tag to the next group. – Tim Biegeleisen Apr 13 '18 at 15:31
Sorry Man, still it's getting groups with CM as well. – farbiondriven Apr 13 '18 at 15:43
1

@farbiondriven I now use a tempered dot everywhere. – Tim Biegeleisen Apr 13 '18 at 15:47
Bingo !! Working like a charm ! I only needed to put a \n at the end to avoid it leaving an empty line. – farbiondriven Apr 13 '18 at 15:51

score 2 · Answer 2 · answered Apr 13 '18 at 16:08

2

You should not use regex for xml, you can read why here: https://stackoverflow.com/a/1732454/3763374

Instead you can use some parser like Xpath

answered Apr 13 '18 at 16:08

jixbo

21
2

1

True, for this specific case xml parser would have been simpler. The accepted answer can be applied for any text. – farbiondriven Apr 13 '18 at 16:13
I won't upvote or downvote your answer because you are right, but: 1) It should be posted as a comment (since it isn't really an answer, more an advice), 2) even if the linked question has many upvotes, the accepted answer is more a joke than an useful answer, 3) XPath isn't a parser, it's a query language. – Casimir et Hippolyte Apr 13 '18 at 21:02

Notepad++ Regex to find group of lines with condition

2 Answers2

Demo