3

To match a string with pattern like:

-TEXT-someMore-String

To get -TEXT-, I came to know that this works:

/-(.+?)-/ // -TEXT-

As of what I know, ? makes preceding token as optional as in:

colou?r matches both colour and color

I initially put in regex to get -TEXT- part like this:

/-(.+)-/

But it gave -TEXT-someMore-.

How does adding ? stops regex to get the -TEXT- part correctly? Since it used to make preceding token optional not stopping at certain point like in above example ?

Dev555
  • 2,128
  • 4
  • 30
  • 40

4 Answers4

9

As you say, ? sometimes means "zero or one", but in your regex +? is a single unit meaning "one or more — and preferably as few as possible". (This is in contrast to bare +, which means "one or more — and preferably as many as possible".)

As the documentation puts it:

However, if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible, so the pattern /\*.*?\*/ does the right thing with the C comments. The meaning of the various quantifiers is not otherwise changed, just the preferred number of matches. Do not confuse this use of question mark with its use as a quantifier in its own right. Because it has two uses, it can sometimes appear doubled, as in \d??\d which matches one digit by preference, but can match two if that is the only way the rest of the pattern matches.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • 6
    Yes. `+` and `*` are called "greedy" operators: they consume as much as they can get away with. In contrast, `+?` and `*?` are the "non-greedy" versions: they consume as little as they can get away with. Search on the web for "regex greedy nongreedy" for more information/tutorials. – Edward Loper Mar 08 '12 at 19:54
  • @EdwardLoper it is a shame I can't fav comments. I will copy your comment to a .txt :D – ajax333221 Mar 08 '12 at 21:50
4

Alternatively, you can use Ungreedy modifier to set the whole regular expression to search for preferably as short as possible match:

/-(.+)-/U
Maxim Krizhanovsky
  • 26,265
  • 5
  • 59
  • 89
2

? before a token is shorthand for {0,1}, which means: Anything up from 0 to 1 appearances as the foremost.

But + is not a token, but a quantifier. shorthand for {1,}: 1 up to endless appearances.

A ? after a quantifier sets it into nongreedy mode. If in greedy mode, it matches as much of the string as possible. If non greedy it matches as little as possible

yunzen
  • 32,854
  • 11
  • 73
  • 106
1

Another, perhaps the underlying error in your regex is that you try to match a number of arbitrary characters via .+?. However, what you really want is probably: "any character except -". You can get that via [^-]+ In this case, it doesn't matter if you do a greedy match or not -- the repeated match will terminate as soon as you encounter the second "-" in your string.

drmirror
  • 3,698
  • 26
  • 26