0

I know that:
Lazy quantifier matches: As Few As Possible (shortest match)

Also know that the constructor:

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

And:
ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference

And:
At most one grammar option must be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference

And:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match


Here is my code: + Live

#include <iostream>
#include <string>
#include <regex>

int main(){

        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;

        using namespace std::regex_constants;

        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
} 

and the output:

1
s/one/
1
s/one/two/three/four/five/six/g

Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.

std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?

In fact, here, the non-greedy .+? quantifier has the same meaning as greedy one, and both /.+?/ and /.+/ match the same string. They are different patterns. So the problem is why the question mark is ignored?

regex101

Fast test:

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g

NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match. Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!

More Detail

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901
Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44
  • There is no issue here, the results are expected. Note that the "Lazy quantifier matches: As Few As Possible (shortest match)" is a wrong statement as lazy quantifier just makes a regex engine grab the mstching text up to the leftmost occurrence of the subsequent subpattern(s), it does not yield the shortest matches.And a dot matches any symbol other than line break, so `^.*?$` is the same as `^.*$`. – Wiktor Stribiżew Feb 23 '17 at 19:28
  • @WiktorStribiżew. I already have seen your other answers about `lazy-quantifier` but here it does not **make sense** to me why `?` is ignored? I turned on the flags `match_not_bol | match_not_eol` but still it is ignored! – Shakiba Moshiri Feb 23 '17 at 19:35
  • The quantifiers are never *ignored*. Moreover, it is not `?`, it is `+?`. sln already informed you that `regex_match` requires a full string match (=anchors the match at start and end of string), and as `.` matches any char, the `.+?` just matches up to the first obligatory subpattern on the right. [Here is the regex demo](https://regex101.com/r/jpgcck/2). When you use `regex_search`, the results would [look like this](https://regex101.com/r/StYFOS/1). – Wiktor Stribiżew Feb 23 '17 at 21:38
  • The *`std::regex_match` should not match anything* is wrong just because the `^s?/.+?/g?$` matches the first `s`, then `/` then any 0+ chars as few as possible up to the first `/` that may be followed by `g` (not necessarily) that is at the end of the string. – Wiktor Stribiżew Feb 23 '17 at 21:40
  • @WiktorStribiżew. You are **right**. If I use `^` at the beginning and `$` at the end of the **regex** it matches **the whole** string. but I put the flags `match_not_bol | match_not_eol` and it still matches! **if the flags is ignored** then why ever does it exist? or have created? – Shakiba Moshiri Feb 23 '17 at 21:51
  • These flags do not unanchor the pattern, they have no effect on `regex_match`. If you go to the `regex_match` reference page, you will see it is repeated several times: the pattern must match the entire char sequence. The flags are only meant to make `^` and `$` match at the very start/end of string with `regex_search`. – Wiktor Stribiżew Feb 23 '17 at 21:57

1 Answers1

0

I don't see any inconsistency. regex_match tries to match the whole string, so s?/.+?/g? lazily expands till the whole string is covered.

These "diagrams" (for regex_search) will hopefully help to get the idea of greediness:

Non-greedy:

a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba  # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba  # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".

Greedy:

a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa|  # try .* == "baba" first
# backtrack
a.*|a: abab|a  # try .* == "bab" now
a.*a|: ababa|

And regex_match( abc ) is like regex_search( ^abc$ ) in this case.

Kirill Bulygin
  • 3,658
  • 1
  • 17
  • 23
  • Way does it **expand** whereas I use **non-greedy**? – Shakiba Moshiri Feb 23 '17 at 17:59
  • 1
    @k-five The reason is, regex match has implicit BOS/EOS anchors `^$`, the engine will always prefer to match over non-match. Non-greedy is just a suggestion, not a reality. –  Feb 23 '17 at 18:04
  • @sln I know about that. with `match_not_eol | match_not_bol` flags still matches! – Shakiba Moshiri Feb 23 '17 at 18:08
  • @KirillBulygin I know about that. with `match_not_bol | match_not_eol` flags still matches! – Shakiba Moshiri Feb 23 '17 at 18:13
  • 2
    @k-five I think the flags `match_not_bol | match_not_eol` are ignored for regex_match. Better check the Boost regex docs. Also, like said, regex_match can _only_ match the whole string. The regex `.+?` actually gives it a way to do that. –  Feb 23 '17 at 18:17