5

Why is it that "hello".match(/^(.*?)?/)[0] evaluates to "h" rather than to ""?

Put another way, why does following a lazy expression (.*?) with a zero-or-one quantifier ? make it a little greedy?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • Interesting observation. But I'm not sure the answer will be very exciting. My guess is that a `?` quantifier will make a lot of zero-length matches and that's pretty useless - even `/^.?/` matches `"h"` only and I'm not sure I'd ever find value of it matching `""`, even if it's a valid result. – VLAZ Dec 02 '19 at 11:38
  • @VLAZ: I totally find value in it... in fact it actually resulted in a bug in my code because it stripped off a single character from the rest of the string when I wasn't expecting it to :( – user541686 Dec 02 '19 at 11:43
  • 3
    Further confusing - `"hello".match(/^(.*?){0,2}/)[0] # => "he"`. – ndnenkov Dec 02 '19 at 11:48
  • Well, think of it this way - if you put a `?` quantifier, and the regex engine *always* skips it, it will still maintain valid results. But `hello( world)?` will never match the full string. This makes the quantifier essentially useless unless in *very* specific circumstances – VLAZ Dec 02 '19 at 11:48
  • 1
    @VLAZ: Why skip it? Taking out the second `?` already results in a zero-length match. Putting the `?` back wouldn't need to skip that match at all. I'm fully expecting it to match. – user541686 Dec 02 '19 at 11:49
  • Indeed weird and causing different matches in different regex flavors. On the other hand I would not wonder when doing weird things, other weird things will happen :) – bobble bubble Dec 02 '19 at 11:52
  • @Mehrdad following your expectation of `?` first matching the *absence* of the quantified token(s) `"hello world".match(/hello( world)?/)` would result in `hello` in every situation. OK, in that case, we have a capturing group, but let's say `"hello world".match(/hello(?: world)?/)` - then there is no capturing group, hence no reason the engine to prioritise the quantified token and thus it would produce `hello` only. But that's a useless regex, since it's the same as `/hello/` for all input. Hence why the engine likely resolves `( world)?` as `/( world|)/` instead. – VLAZ Dec 02 '19 at 11:53
  • @VLAZ: No... if you follow what I'm saying, `/hello( world)?/` would match `hello world`. Again -- the question mark is matching greedily, expecting the expression inside the parentheses to be present unless this isn't possible. So in my example, it would expect `.*?` to be present. Which of course it is -- it matches an empty string, just like it is in isolation. There's no reason for the second `?` to cause that to become *absent*. – user541686 Dec 02 '19 at 11:56
  • @Mehrdad that's the thing, it *transitively* implies that the `?` quantifier will match empty first. Since for `/^(.*?)?/` if the group resolves as zero-length match (which it would always) you'll get (the equivalent of) `(|)` - match empty or...empty. It's a rather useless thing to check for. – VLAZ Dec 02 '19 at 12:09
  • @VLAZ: Well, useless or not is a matter of opinion (as I mentioned, in my case this actively resulted in a bug), but it seems like a fact to me that it would not match `hello` like you're suggesting. – user541686 Dec 02 '19 at 12:12
  • 2
    regex101 shows that this is only the case in JavaScript: https://regex101.com/r/3oYGgl/1. Other flavours do what you'd expect and return a zero-length match. – Mark Whitaker Dec 02 '19 at 12:51

1 Answers1

3

It's not that the inner quantifier has become greedy, but that it has tried to avoid matching a completely empty section. This is why the .* still only matches the first character, not the whole word.

This is an oddity of JavaScript regex. Empty matching sections with greedy quantifiers on them are handled slightly differently to other common regex engines. The real reason for this is intricate. See: Greediness behaving differently in JavaScript?

A workaround is to make the outer quantifier lazy too, with an additional question mark:

"hello".match(/^(.*?)??/)[0] // output: ""
Boann
  • 48,794
  • 16
  • 117
  • 146