-1

I have the following regex: ([0-9]+)\/([0-9]*)\/([0-9]*) (properly escaped in my code). As you can see, this has three capture groups: one that must contain at least one number, and two additional groups (that may be empty).

I'm trying to run this over a string that should produce three matches for the entire regex. For example:

f 5/1/1 1/2/1 4/3/1

In this case, the result of the regex should be the following:

Match 1: 5/1/1, Group 1: 5, Group 2: 1, Group 3: 1

Match 2: 1/2/1, Group 1: 1, Group 2: 2, Group 3: 1

Match 3: 4/3/1, Group 1: 4, Group 2: 3, Group 3: 1

However, the way I understand it, C++11 can't return both the matches and the groups.

If I were to run the following code,

std::smatch matchs;
std::regex_search("f 5/1/1 1/2/1 4/3/1", matches "([0-9]+)\\/([0-9]*)\\/([0-9]*)");

matches would have 10 elements: matches[0] would be everything from 5 to the end, and matches[1]-matches[9] would have the capture groups. But I'm not only trying to get the groups, I'm trying to get each of the matches (preferably with the groups organized by match).

As in: matches[1] would have 5/1/1, matches[2] would have 1/2/1, and matches[3] would have 4/3/1. Then, in something like (for example): groups[n] would have the corresponding group. Or, if possible, matches[1].groups would have the groups that were found within the match.

Is this correct? And/or is there some way to easily get both matches and capture groups?

Note: This is not a duplicate as other questions seem to be asking either about multiple matches or groups, not both at the same time.

PixelArtDragon
  • 214
  • 2
  • 12
  • It is a duplicate, see https://stackoverflow.com/a/30495370/3832970 answer – Wiktor Stribiżew Sep 15 '19 at 20:47
  • @WiktorStribiżew I'm not just looking to get all the groups. I'm trying to get all the groups while at the same time getting the matches as strings. Not the same thing, hence why I indicated what would be the result of the regex and how that's different from the groups that I'd get in `matches`. – PixelArtDragon Sep 15 '19 at 20:49
  • You are using `regex_search` which is wrong. Use the regex iterator to get all matches. With each match, you have access to the submatches. That is how it is designed and that is what you will have to use. – Wiktor Stribiżew Sep 15 '19 at 20:52
  • Ok, but that is an answer to this present question, not the same as saying this is a duplicate question. – PixelArtDragon Sep 15 '19 at 20:56
  • No, that is the solution you seek, so it is. – Wiktor Stribiżew Sep 15 '19 at 21:11

1 Answers1

0

The way it is done is to iterate using regex_search() which has
many prototypes.

Regex iterator stuff reverts to this behind the scenes.

Here you go.
There are many more answers to how to use regex in C++,
just let me know.

std::string::const_iterator start = str.begin();
std::string::const_iterator end   = str.end();
std::smatch m;

std::regex rx( "([0-9]+)/([0-9]*)/([0-9]*)" );

while ( std::regex_search( start, end, m, rx ) )
{
    std::string sWholeMatch = m[0].str();
    std::string sGrp1 = m[1].str();
    std::string sGrp2 = m[2].str();
    std::string sGrp3 = m[3].str();

    int lenGrp1 = sGrp1.length(); 
    int lenGrp2 = sGrp2.length(); 
    int lenGrp3 = sGrp3.length(); 

    start = m[0].second;
}
  • I see, I'm used to other languages where there's no difference between using a search to match one result (and then the groups within that match) and matching many results. The c++ way isn't nearly as straightforward (plus I can see there being an issue if the matches overlap). – PixelArtDragon Sep 15 '19 at 21:27
  • No, it is always a single match that encompases sub groups, _always_ ! You get all the info in the match object which is nothing more than as repository of pointers into the target string. In the case of _Overlapped matches_ the actual match is not of value, it is the capture group which inside of an assertion, extends beyond the current position if nothing gets matched. i.e. `(?=([a-z]{3]))` where nothing is consumed in group 0, but the engine auto-bumps the current position by 1 every time a zero width match is made.. –  Sep 15 '19 at 21:38
  • Ah, wow, that is even more complicated than I thought, but thank you for the explanation. – PixelArtDragon Sep 15 '19 at 21:40
  • Na, not so complex as all that. Note that the match object (_m_ in this case) contains many methods and vars you can browse with Intellisence if using an IDE with that. –  Sep 15 '19 at 21:41
  • The match object itself I understood, it was how to get many multiple match objects that was the issue. – PixelArtDragon Sep 15 '19 at 21:44
  • Well you could get creative and copy each match object into it's own smatch instance and add it to a vector. After the `while()` is done you can loop through the list of smatch objects. The caveat is that the original string must still be valid and in the scope of the smatch objects. Realize though, you're just delaying what you're going to do after each match and you gain nothing really. –  Sep 15 '19 at 21:48
  • Yeah, unless I need the matches afterwards there's nothing to be gained by copying them. And even if I do, copying these strings is negligible (but I can see the advantage of using pointers to an original string if they were going to be big). – PixelArtDragon Sep 15 '19 at 21:51