3

i have try to use that regex /^(\S+)(?:\?$|$)/

with yolo and yolo?

works with both but on the second string (yolo?) the ? will be include on the capturing group (\S+).

It's a bug of regex or i have made some mistake?

edit: i don't want that the '?' included on the capturing group. Sry for my bad english.

Mike H-R
  • 7,726
  • 5
  • 43
  • 65
Keltere
  • 57
  • 7

5 Answers5

4

You can use

  • If what you want to capture can't have a ? in it, use a negated character class [^...] (see demo here):

    ^([^\s?]+)\??$
    
  • If what you want to capture can have ? in it (for example, yolo?yolo? and you want yolo?yolo), you need to make your quantifier + lazy by adding ? (see demo here):

    ^(\S+?)\??$
    
  • There is BTW no need for a capturing group here, you can use a look ahead (?=...) instead and look at the whole match (see demo here):

    ^[^\s?]+(?=\??$)
    

What was happening

The rules are: quantifiers (like +) are greedy by default, and the regex engine will return the first match it finds.

Considers what this means here:

  • \S+ will first match everything in yolo?, then the engine will try to match (?:\?$|$).
  • \?$ fails (we're already at the end of the string, so we now try to match an empty string and there's no ? left), but $ matches.

The regex has succesfully reached its end, the engine returns the match where \S+ has matched all the string and everything is in the first capturing group.

To match what you want you have to make the quantifier lazy (+?), or prevent the character class (yeah, \S is a character class) from matching your ending delimiter ? (with [^\s?] for example).

Community
  • 1
  • 1
Robin
  • 9,415
  • 3
  • 34
  • 45
2

The below regex would capture all the non space characters followed by an option ?,

^([\S]+)\??$

DEMO

OR

^([\w]+)\??$

DEMO

If you use \S+, it matches even the ? character also. So to seperate word and non word character you could use the above regex. It would capture only the word characters and matches the optional ? which is follwed by one or more word characters.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
2

This is the correct response as \S+ matches one or more non-whitespace characters greedily, of which ? is one.

thus the question mark is matched in the (\S+) group and the non-capturing group resolves to $ you could make it work as you expect by making the match non-greedy with:

/^(\S+?)(?:\?$|$)/

demo

alternatively you could restrict the character group:

/^([^\s?]+)(?:\?$|$)/

demo

Mike H-R
  • 7,726
  • 5
  • 43
  • 65
2

Make the + non greedy:

^(\S+?)\??$
Toto
  • 89,455
  • 62
  • 89
  • 125
2

It is doing that because \S matches any non-white space character and it is being greedy.

Following the + quantifier with ? for a non-greedy match will prevent this.

^(\S+?)\??$

Or use \w here which matches any word character.

^(\w+)\??$
hwnd
  • 69,796
  • 4
  • 95
  • 132