2

There is probably a really simple explanation but I just don't see it right now... I have this regex:

(\s.+?\sstress)

I want it to match something like [SPACE]some word[SPACE]stress. However it matches to much:

This will cause a lot of work stress 

will match: will cause a lot of work stress
But .+? should be non-greedy so I expected it to only match work stress.
Click here to open this in regex101.

CodeNoob
  • 1,988
  • 1
  • 11
  • 33

1 Answers1

2

.*? is non-greedy, but the regex engine works from left to right, and the first \s matches the left-most whitespace, and . can match any char, and thus, although it is lazily quantified, it must get to the whitespace followed with stress substring.

To just get work stress, use

\s(\S+\sstress)

or just

\S+\s+stress

See the regex demo.

The main point here it to exclude whitespace matching between the first \s and the second \s in the regex. \S+ matches one or more non-whitespace symbols and is a more restrictive pattern compared to ..

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is this left-to-right approach always the case? In other words is this the case in every programming lanuage? – CodeNoob May 05 '17 at 22:40
  • Can you please explain the second regex? – CodeNoob May 05 '17 at 22:41
  • In .NET and PyPi regex, you may tell the regex engine to parse the string from right to left using a special regex option. See [this regex demo](http://regexstorm.net/tester?p=%5cs.%2b%3f%5csstress&i=This+will+cause+a+lot+of+work+stress&o=r&s=36). I updated the answer with a short description of the main change - using `\S` shorthand character class. It matches the same as `[^\s]` - any char but whitespace. – Wiktor Stribiżew May 05 '17 at 22:42
  • @CodeNoob: Is it fine now, or should I add more details? – Wiktor Stribiżew May 05 '17 at 22:46
  • Oow I see, never knew that \S existed hahah thankyou! – CodeNoob May 05 '17 at 22:46
  • 1
    @CodeNoob: These are called reverse/opposite character classes: `\d` ~ `\D`, `\s` ~ `\S`, `\w` ~ `\W`, `\p{UNICODE_CATEGRY}` ~ `\P{UNICODE_CATEGRY}`. Even `\b` (word boundary) and `\B` (any position other than a word boundary) also follow the same "behavior". – Wiktor Stribiżew May 05 '17 at 22:47
  • 2
    thankyou for the thorough explanation! Learned something today :) – CodeNoob May 05 '17 at 22:50
  • @CodeNoob: Also, please read [this answer of mine to learn how lazy quantifiers work](http://stackoverflow.com/questions/33869557/can-i-improve-performance-of-this-regular-expression-further/33869801#33869801) (the *Difference between `.*?`, `.*` and `[^"]*+` quantifiers* section). – Wiktor Stribiżew May 05 '17 at 22:52