1

Assume I have a string like this:

"a-b-c-d"

n = 4 sequences seperated by "-".

Now I want to receive the first n - 1 sequences ("a-b-c") and the last sequence - ("d").

I can achieve this with the following code:

std::string string{ "a-b-c-d" };

std::regex reg{ "^(.*)-(.*)$" };

std::smatch match;
std::regex_match(string, match, reg);

std::cout << match.str(1) << '\n';
std::cout << match.str(2) << '\n';

producing the excpected output:

a-b-c
d

However, following the pure logical grammar of this regex ("^(.*)-(.*)$")

a
b-c-d

or

a-b
c-d

could also be valid matches of the string. Afterall (.*) could be interpreted differently here and the first (.*) could decide to stop at the first character sequence or the second etc.

So my question: is std::smatch guaranteed to behave this way? Does std::smatch always explicitly look for the last patterns when giving the option to capture with (.*)? Is there a way to tell std::smatch to look for the first occurrence rather than the last?

Stack Danny
  • 7,754
  • 2
  • 26
  • 55

1 Answers1

1

* is greedy. So the first (.*) matches as much as it can while the second (.*) still has something left to match. There is only one correct match, and it is the one you want.

If you want the first group to be matched non-greedily, add a ? after the *:

^(.*?)-(.*)$

For your example input a-b-c-d this leaves you with a in the first capture group and b-c-d in the second.

Max Langhof
  • 23,383
  • 5
  • 39
  • 72