0

I have a regular expression that involves some grouping, and I am having a difficult time understanding why the last character is being grouped alone using this regex:

Regex: ([A-Za-z]+).*([A-Za-z]+)
String Example: Hello World

I dont get why the 'd' is grouped alone by itself, I know it has something to do with .* but cannot wrap my head around this when testing on https://www.regex101.com

If I put the .* in the front of the expression only the l and the d are captured, why is that?

Any help appreciated, thanks

Nick
  • 138,499
  • 22
  • 57
  • 95
lucyb
  • 333
  • 5
  • 15
  • 1
    The `.*` is being greedy and grabbing all the characters from space to `d`. Make it non-greedy (lazy) with a `?` i.e. `([A-Za-z]+).*?([A-Za-z]+)` – Nick Apr 16 '20 at 00:51
  • 1
    but why stop at d? how come it does not stop at l or r? – lucyb Apr 16 '20 at 01:05
  • 3
    Because it grabs as many characters as it can *without* preventing the regex from matching. – Nick Apr 16 '20 at 01:08
  • 1
    that makes sense! Thank you! – lucyb Apr 16 '20 at 01:14
  • it would not match with the z though right ? – lucyb Apr 16 '20 at 01:26
  • 1
    Lucy, I misspoke in my earlier comment. For the string `abczefgzhijzk`, the regular expression `[a-z]*z` matches `abczefgzhijz`. The regular expression `[a-z]*?z` (or `[a-y]*z`) matches `abcz`. If you want to match up to the first `z` (`abc`), use `[a-y]*` (or `[^z]*`). If you want to match up to the last `z` (`"abczefgzhij"`), use `[a-z]*(?=z)`, `(?=z)` being a *positive lookahead*. – Cary Swoveland Apr 16 '20 at 02:47

0 Answers0