0

I have a regex where I want to get Properties which contains %%text%%, but my regex is capturing more than that

My regex: (<Properties>).+?%%.+?%%.+?(<\/Properties>)

It maches:

"<Properties>
<Property>TEXT</Property>
</Properties>
<Properties>
<Property >%%TEXT%%</Property>
</Properties>"

But I want him to match only:

"<Properties>
<Property >%%TEXT%%</Property>
</Properties>"

What am I doing wrong?

2 Answers2

3

Use tempered greedy token instead of .:

<Properties>(?:(?!<\/Properties>)[^])*%%(.+?)%%(?:(?!<\/Properties>)[^])*<\/Properties>

This part (?:(?!<\/Properties>)[^]) makes sure we haven't </properties> before the wanted text.

[^] stands for any character including newlines.

Demo

Toto
  • 89,455
  • 62
  • 89
  • 125
1

Let's break down the regex vs the actual match so you can see why it matches:

(<Properties>).+?%%.+?%%.+?(<\/Properties>)
  • (<Properties>) matches the first <Properties>.
  • .+? matches one or more characters until it encounters %%, thus matching <Property>TEXT</Property><Properties><Property >.
  • %% matches %%.
  • .+? matches one or more characters until it encounters %%, thus matching TEXT.
  • %% matches %%.
  • .+? matches one or more characters until it encounters </Properties> thus matching </Property.
  • (<\/Properties>) matches </Properties>.

Instead you want to make your regex more explicit:

(?:[^<%]|%(?!%)|<(?!\/Properties>))

The above will match one character that is not < or %, if it is one of those two it will only match % if not followed by another % and it will only match < if not followed by /Properties>. This should be used as replacement for your .. Resulting in:

(<Properties>)(?:[^<%]|%(?!%)|<(?!\/Properties>))+%%(?:[^<%]|%(?!%)|<(?!\/Properties>))+%%(?:[^<%]|%(?!%)|<(?!\/Properties>))+(<\/Properties>)

Since the regex is more explicit I can remove the lazy ? quantifier modifier safely.

3limin4t0r
  • 19,353
  • 2
  • 31
  • 52