After learning basic c++ rules,I specialized my focus on std::regex
, creating two console apps: 1.renrem
and 2.bfind
.
And I decided to create some convenient functions to deal with regex
in c++ as easy as possible plus all with std
; named RFC ( = regex function collection )
There are several strange things that always make me surprise, but this one ruined all my attempt and those two console apps.
One of the important functions is count_match
that counts number of match inside a string. Here is the full code:
unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){
const bool flags_has_i = flags.find( "i" ) < flags.size();
const bool flags_has_g = flags.find( "g" ) < flags.size();
std::regex::flag_type regex_flag = flags_has_i ? std::regex_constants::icase : std::regex_constants::ECMAScript;
// std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
std::regex rx( user_pattern, regex_flag );
std::match_results< std::string::const_iterator > mr;
unsigned int counter = 0;
std::string temp = user_string;
while( std::regex_search( temp, mr, rx ) ){
temp = mr.suffix().str();
++counter;
}
if( flags_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
First of all, as you can see, the line for search_flag
was commented because it is ignored by std::regex_search
and I do not know why? since -- the exact flag is accepted for std::regex_repalce
. So std::regex_search
ignores the format_first_only
but std::regex_replace
accepts it. Let's it goes.
The main problem is here that the icase
flag is also ignored when the pattern is character class -> []
. In fact when the pattern is only capital letter
or small letter
: [A-Z]
or [a-z]
Supposing this string s = "ONE TWO THREE four five six seven"
the output for c++ std
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 3 => Global match plus insensitive
whereas for the exact perl and d laugauge and c++ with boost
the output is:
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 7 => Global match plus insensitive
I know about regex flavors PCRE; or ECMAScript 262 that c++ uses it, But I have no ides why a simple flag, is ignored for the only search function that c++ has? Since std::regex_iterator
and std::regex_token_iterator
are also use this function internally.
And shortly, I can not use those two my apps and RFC with std
library because if this!
So if someone knows according to which rule it is maybe a valid rude in ECMAScript 262
or perhaps if I am wrong anywhere please tell me. Thanks.
tested with
gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04)
clang version 3.8.0-2ubuntu4
perl code:
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7
d code:
uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){
const bool flag_has_g = flags.indexOf( "g" ) != -1;
Regex!( char ) rx = regex( user_pattern, flags );
uint counter = 0;
foreach( mr; matchAll( user_string, rx ) ){
++counter;
}
if( flag_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
the output:
writeln( count_match( s, "[A-Z]+", "g" ) ); // 3
writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7
js code:
var s = "ONE TWO THREE four five six seven";
var rx1 = new RegExp( "[A-Z]+" , "g" );
var rx2 = new RegExp( "[A-Z]+" , "gi" );
var counter = 0;
while( rx1.exec( s ) ){
++counter;
}
document.write( counter + "<br>" ); // 3
counter = 0;
while( rx2.exec( s ) ){
++counter;
}
document.write( counter ); // 7
Okay. After testing with gcc 7.1.0
it turned out that with version below 6.3.0
the output is: 1 3 3
and but with 7.1.0
the output is 1 3 7
here is the link.
Also with this version of clang
the output is correct. Here is the link. thanks to igor-tandetnik user