5

I have this pattern:

(\w+)(sin|in|pak|red)$

And the replacement pattern is this:

$1tak

The problem is that this word:

setesin

will be transformed to:

setestak

instead of

setetak

For some reason, in always takes precedence to sin in the pattern.

How can I enforce the pattern to follow that order?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Cornwell
  • 3,304
  • 7
  • 51
  • 84

2 Answers2

9

Use a lazy quantifier:

(\w+?)(sin|in|pak|red)$
    ^

See the regex demo

The \w+ contains a greedy quantifier that: 1) grabs as many chars as it can (and note it can match s, i, all letters, digits and underscores) and then backtracks (yielding one char after another moving from right to left), trying to accommodate for the subsequent patterns. Since the in is found first, it is matched, and the whole group is considered matched, the regex goes on to check the end of string with $. A lazy quantifier will have the regex engine skip the \w+? after matching 1 word char, and other patterns will be tried, moving from left to right.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you for the explanation! Will accept the answer in a few mins – Cornwell Nov 28 '16 at 13:39
  • 1
    Greedy quantifiers cause *backtracking* - that is, the subsequent patterns are found at the rightmost locations. Lazy quantfiers cause the subpattern *expansion* (a reverse backtracking) and the subsequent subpatterns are found at the leftmost locations. It is not appropriate to say lazy or greedy quantifiers define the order of pattern matching, but it looks as if it were so. – Wiktor Stribiżew Nov 28 '16 at 13:42
  • 1
    @Cornwell Lazy means it will not stop matching once it's matched the `in` but will continue checking which means it will eventually match `sin`. It will only match `in` if that is an exact match, i.e. there is no `sin` to match – bixarrio Nov 28 '16 at 13:44
  • 1
    Also, see [more examples of how lazy and greedy quantifiers work](http://stackoverflow.com/questions/33869557/can-i-improve-performance-of-this-regular-expression-further/33869801#33869801). – Wiktor Stribiżew Nov 28 '16 at 13:46
3

Don't use a quantifier at all:

(\w)(?:sin|in|pak|red)$

with the same replacement

or

\B(?:sin|in|pak|red)$

with tak as replacement. The non-word-boundary \B ensures that there's a word character before (If a first word character isn't mandatory before the alternation remove the \B).

With these two ways the first occurrences on the left are found first and are not consumed by the greedy quantifier.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125