20

I have to remove the string between two delimiters, i.e From "123XabcX321" I want "123321". For a simple case, I'm fine with:

$_=<>;
s/X(.*)X//;
print;

But if there's ambiguity in the input like "123XabcXasdfjXasdX321", it matches the first X with the last X and I get "123321" but I want "123asdfj321". Is there a way to specify an "eager" match that matches with the first valid possible delimiter and not the last?

GClaramunt
  • 3,148
  • 1
  • 21
  • 35

2 Answers2

41

It's normally called "ungreedy", you put a ? after the quantifier: s/X(.*?)X//;

Anomie
  • 92,546
  • 13
  • 126
  • 145
  • 2
    and in the example given, /g would be needed to substitute more than once. – ysth Mar 28 '11 at 03:06
  • 1
    I think "non-greedy" is the more common term. At any rate, the default is greedy matching, and you want the opposite. – cjm Mar 28 '11 at 03:39
  • Note that in Gnu `grep` you'll need to use `--perl-regexp` (`-P`) for the lazy operator (or use the approach below). [reference](http://stackoverflow.com/questions/3027518/non-greedy-grep) – bgamari Aug 28 '14 at 21:52
  • Be aware of this: `"XaXbXY" =~ /X(.*?)XY/` => `aXb` – ikegami May 02 '17 at 13:46
  • @ikegami: Aware of what, it working exactly as it's supposed to? – Anomie May 03 '17 at 10:54
  • Be aware that this solution is extremely fragile. It's practically impossible to use it in a large pattern; a million different tiny changes can break it since adding `?` doesn't prevent `.*` from matching anything. – ikegami May 03 '17 at 14:17
  • It's not at all fragile, you just need to understand how it works so you're not making incorrect assumptions. BTW, consider `/X(?>(.*?)X)Y/` if you really want X as a delimiter while still having the extra Y suffix in there too. – Anomie May 03 '17 at 19:26
7

Avoid the non-greedy modifier as anything but a performance hint if you can. Using it can lead to "unexpected" results because adding ? doesn't actually prevent .* from matching anything. For example,

$ perl -le'print for "XaXbXY" =~ /X(.*?)XY/;'
aXb

To avoid matching X, you can use the following:

s/X[^X]*X//g;

If X is really something larger than one character, you can use the following:

s/X(?:(?!X).)*X//g;
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • interesting... I'll try that. In my case, the X is more than one char. I'll have to decipher ?:(?!X) tho – GClaramunt Mar 28 '11 at 23:20
  • @GClaramunt, `(?: )` in regex patterns are like `( )` in Perl. In this case, it indicates that `*` affects `(?!X).` instead of just `.`. `( )` is frequently misused for this purpose. – ikegami Mar 29 '11 at 22:53
  • @GClaramunt, `(?! )` checks that what follows doesn't match the contained pattern. – ikegami Mar 29 '11 at 22:53
  • Why do you prefer that? – Ali Shakiba Jan 22 '15 at 20:47
  • @JohnS, Why don't I use `(...).*?x(...)` to prevent `.*` from matching `x`? Because it doesn't. Non-greediness provides a performance hint; it's not for preventing `.*` from matching. Using non-greediness as anything other than a performance hint is a fragile hack. – ikegami Jan 22 '15 at 21:02
  • Thanks, interesting. Is there any ref explaining it? I googled but didn't find anything. – Ali Shakiba Jan 23 '15 at 19:43
  • The docs clearly say that `?` is the non-greediness modifier. Are you asking of how `(...).*?x(...)` doesn't prevent `.*` from matching `x`? `perl -E'say "abcxdefx123x" =~ /^(.*?)x\d+x/'` – ikegami Jan 23 '15 at 20:17
  • This is an interesting example, but I would say here `?` prevents `.*` from matching `x\d+x`, (that is `x123x`) not just `x`. Anyway I thinks your point is important when using non-greediness and a good reason to consider using negate instead. – Ali Shakiba Jan 25 '15 at 11:16