1

Here is a very simple regex:-

kabk
k(.*)k
result:
kabk has 1 group:
ab

What I normally do is the following, which is right but unnecessary.

k([^k]*)k

Can anyone explain to me how the regex parser work? Is the .* supposed to consume as much as possible.

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
leon
  • 435
  • 1
  • 4
  • 12
  • 2
    possible duplicate of [What do lazy and greedy mean in the context of regular expressions?](http://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) –  Mar 07 '14 at 04:35

4 Answers4

3

Yes, the * operator is greedy, and will capture as many valid characters as it can.

For example, the pattern k(.*)k applied to kkkkak will capture kkka.

You can make an operator non-greedy with the ? operator.

So the pattern k(.*?)k applied to kkkkak will capture nothing, because the smallest non-greedy match is to allow nothing between the first two k characters.

In reply to your comment, the existance of the final k in the pattern means that it needs to leave as much as possible to still match a k after consuming as much as possible.

  • @user764357 here is a suggestion to simplify your excellent example. String `kkkkakbbb` will give the following results: GREEDY PATTERN: `/k.*k/` will return **kkkkak**. `*` is greedy, and will keep searching till it no longer matches) NON-GREEDY PATTERN: `/k.*?k/` will return **kk**kkak. `*?` is non-greedy, and will return as soon as the complete match is found) – dank8 Mar 02 '23 at 05:58
2

In kabk, there's only two ks. So it's not proper example to learn about greedy match.

Let's try it with different example: kabkxyk

Greedy version match returns kabkxyk (.* advances before the last k): (Followng is javascript):

'kabkxyk'.match(/k.*k/)
// ["kabkxyk"]

while non-greedy version match returns kabk:

'kabkxyk'.match(/k.*?k/)
// ["kabk"]
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Thanks. if put greed aside, why not .* in my example consume all the character after the first 'k', which find no match. – leon Mar 07 '14 at 04:38
  • 1
    @leon, The regular expression `.*` match greedily **as long as the entire pattern matches**. – falsetru Mar 07 '14 at 04:40
2

Lets see k([^k]*)k (matching using negation) first on regex101.com debugger::

1   /k([^k]*)k/ kabk
2   /k([^k]*)k/ kabk
3   /k([^k]*)k/ kabk
4   /k([^k]*)k/ kabk
5   /k([^k]*)k/ kabk
6   /k([^k]*)k/ kabk
7   /k([^k]*)k/ kabk
#   Match found in 7 steps.

Now lets see k(.*)k (matching using geedy match) on same regex101.com debugger:

1   /k(.*)k/ kabk
2   /k(.*)k/ kabk
3   /k(.*)k/ kabk
4   /k(.*)k/ kabk
5   /k(.*)k/ kabk
6   /k(.*)k/ kabk BACKTRACK
7   /k(.*)k/ kabk BACKTRACK
8   /k(.*)k/ kabk
9   /k(.*)k/ kabk
#   Match found in 9 steps.

Clearly first regex is much more efficient in longer strings as there is no BACKTRACKing involved.

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Yes, .* is greedy. It matches as much as it can.

.*? is non-greedy, or parsimonious. It matches as little as it can.

If you match "bananas" against ba.*a, it will match banana.

If you match "bananas" against ba.*?a, it will match bana.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152