0

I have a regex containig various sub-groups which are connected through an or condition:

([[:alpha:]]+)|([[:digit:]]+)

When I match the string 1 a 2, I get three matches: 1, a and 2.

Is there a way in C++ to determine which of the sub-patterns matched?

jim
  • 311
  • 3
  • 13
  • Create a minimal running example. Mention which regex library you're using. The answer right now is yes, no more details can be given. – CodeMonkey Jul 06 '17 at 13:06
  • Why do you need to know which group is matched? – Passer By Jul 06 '17 at 13:14
  • If you're using `std::regex` you can pass a `match_results<...>` object when you do the match. That will hold an array of `sub_match<...>` objects, and you can iterate through that array to find the one(s) that matched. – Pete Becker Jul 06 '17 at 13:25

1 Answers1

2

Not directly.

with the std::regex library, match_result class takes care of the sub-match and it has a method named std::match_results::size and with that you can find the number of sub-match.

Ex:

std::string str( "one two three four five" );
std::regex rx( "(\\w+)(\\w+)(\\w+)(\\w+)(\\w+)" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

std::cout << mr.size() << '\n'; // 6  

here the output is 6 not 5 because the match itself is counted as well. You can access them by .str( number ) method or operator[]

So because sub-match are counted form left-to-right you should after seeing the output of size method figure out witch group was matched.

If you change the rx to "(\\w+)(\\d+)(\\w+)" then the size = 0

If you change the rx to "(\\w+).+" then the size is 2. That means you have a whole successful match and a sum-match

Ex:

std::string str( "one two three four five" );
std::regex rx( "(\\w+).+" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

std::cout << mr.str( 1 ) << '\n'; // one
std::cout << mr[ 1 ] << '\n';     // one

the output for both is: one

And also if you want to print only the sub-match you can use a simple loop that has an index and this index starts from 1 not 0

Ex:

std::string str( "one two three four five" );
std::regex rx( "(\\w+) \\w+ (\\w+) \\w+ (\\w+)" );
std::match_results< std::string::const_iterator > mr;

std::regex_search( str, mr, rx );

for( std::size_t index = 1; index < mr.size(); ++index ){
    std::cout << mr[ index ] << '\n';
}

the output is:

one
three
five  

By saying determine which of the sub-patterns matched
if you mean specify which sub-match should be return from the search-engine then the answer is yes by using std::regex_token_iterator you can determine that:

Ex: (Iterate over second sub-match of each match )

std::string str( "How are you today ? I am fine . How about you ?" );
std::regex rx( "(\\w+) (\\w+) ?" );
std::match_results< std::string::const_iterator > mr;

std::regex_token_iterator< std::string::const_iterator > first( str.begin(), str.end(), rx, 2 ), last;

while( first != last ){
    std::cout << first->str() << '\n';
    ++first;
} 

the last parameter is 2 : ( str.begin(), str.end(), rx, 2 ) and it means you want only the second sub-match. So the output is:

are
today
am
about
Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44