-1

I'm new to Perl and is working with regular expressions. I am not able to decide how Perl resolves the ambiguity for a regex match when multiple matches are possible for a given query string. For example

  • ('hellohellohello' =~ m/h.*o/)

    This could match 'hello', 'hellohello' or 'hellohellohello'. Which one will it choose - shortest or largest match ? What if we want opposite behavior (like if default is to find the shortest match then finding the largest match) ?

  • In case the answer to the first is largest consider

    ('hello hellohello' =~ m/h.*o/)

    Here, it could match from the first line (before the newline character) or the second line (after the newline character) - first vs largest match. Which one will it use ?

What are the complete set of rules that can be used to decide which substring of a string would match a given regex (might be some case other than the one mentioned in the examples where multiple matches could be found) ?

him
  • 487
  • 3
  • 12
  • 1
    Look into greedy and non-greedy quantifiers. – Robby Cornelissen May 24 '19 at 08:08
  • 1
    Regex with `g` flag does not usually work like this, once a match is found, the regex index is advanced. So, generally, you can't match several times at one and the same location. In Perl6, however, this is solved. – Wiktor Stribiżew May 24 '19 at 08:10
  • 1
    Read https://www.regular-expressions.info/repeat.html particularly the section about greediness. – Barmar May 24 '19 at 08:15
  • @WiktorStribiżew He's not asking to get all the matches, he just wants to know which one it will match. – Barmar May 24 '19 at 08:16
  • @Barmar Then OP needs something like `m/(?:h[^o]o){1,x}/` where `x` controls how many times `h...o` repeats. Else, I should have kept it closed as another [My regex is matching too much. How do I make it stop?](https://stackoverflow.com/questions/22444) dupe. – Wiktor Stribiżew May 24 '19 at 08:21
  • Why does he "need" that? He never said what he wanted to match, he just asked how it normally matches. – Barmar May 24 '19 at 08:23
  • @Barmar See *could match 'hello', 'hellohello' or 'hellohellohello'* What you suggest, only matches `hello` (non-greedy) or `hellohellohello` (greedy) in `'hellohellohello'` – Wiktor Stribiżew May 24 '19 at 08:23
  • Those are the potential matches because he doesn't know how `*` limits itself. – Barmar May 24 '19 at 08:24

1 Answers1

1

* is greedy, so it tries to match the longest possible string, so long as the rest of the pattern can still be matched. So it will match hellohellohello.

If you use *? instead, that makes it non-greedy, and it will match the shortest possible string, again as long as the rest of the pattern matches. So m/h.*?o/ will match hello.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Just greedy/non-greedy does not fully solve the issue. See *could match 'hello', 'hellohello' or 'hellohellohello'* What you suggest, only matches `hello` (non-greedy) or `hellohellohello` (greedy) in `'hellohellohello'` – Wiktor Stribiżew May 24 '19 at 08:24
  • You're not understanding the question. He wants to know which of the possible matches it actually matches. And greediness is used to determine that. – Barmar May 24 '19 at 08:25
  • To be precise, in all cases it will match the *first* possible match from left to right. Greediness only determines whether it will be the shortest or longest match starting from that position, even if it's not the shortest or longest match possible overall. – Grinnz May 24 '19 at 14:26