0

I am exploring the regular expression library in C++11.I am bit confused about the behavior of 'regex_search()' function which I wanted to clarify. Below is my sample program and it returns the below output

String that matches the pattern:

test 1 2 3 4 5 abc def abc

My query is that, why is it NOT matching the pattern test 1 2 3 4 5 abc but matching only test 1 2 3 4 5 abc def abc ? Should it not match the first one(ie test 1 2 3 4 5 abc) also for the given regular expression? Can someone please help me to understand?

#include <iostream> 
#include <regex> 
#include <string> 
using namespace std; 
   
int main()
{
    std::string inputStr = "test 1 2 3 4 5 abc def abc";
    std::string regexPattern = "test 1.*abc";
    
    regex regexp(regexPattern, std::regex::grep);
    smatch m; 
    
    while(std::regex_search(inputStr, m, regexp, std::regex_constants::match_default))
    {
        std::cout<<"String that matches the pattern: "<< m.str() << std::endl;
        inputStr = m.suffix();
    }
    
    return 0;
}

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
bobix
  • 81
  • 8
  • 5
    Regular expression engines use "greedy matching" by default, so `.*` matches "as much as it can". – Botje May 05 '23 at 08:03
  • Side note: `#include ` probably was a typo, otherwise note it doesn't include the definition for `std::string`. – πάντα ῥεῖ May 05 '23 at 08:20
  • Can you explain the pattern in more detail? Is it supposed to be 5 numbers each separated by a space followed by three 3-letter words whose characters are neighbors? – Dimitar May 05 '23 at 08:20
  • 1
    @Dimitar, The pattern and the input string can be any format. My intention in the above example was to match a pattern which starts with 'test 1' and end with 'abc' any characters can be in between the start and end. – bobix May 05 '23 at 08:27
  • @Botje, Is there any option to change the default behaviour in the regex library so that it matches the first occurrence also? Or is there any other method in the regex library to match all the occurrence of the given pattern in a string? – bobix May 05 '23 at 08:30
  • A second thing to know is that many regular expression engines do not support overlapping matches. – Botje May 05 '23 at 08:33
  • `.*?` should match [non-greedily](https://en.wikipedia.org/wiki/Regular_expression#Lazy_matching) – but only seems to work with `ECMAScript` (which is the default). – Aconcagua May 05 '23 at 08:36
  • [This](https://stackoverflow.com/questions/30007942/c-regex-non-greedy-match) might be of relevance (if not duplicate). – Aconcagua May 05 '23 at 08:39

1 Answers1

2

I was able to make the regex_search() to match the first occurrence of the pattern(ie test 1 2 3 4 5 abc instead of test 1 2 3 4 5 abc def abc) by making the search 'lazy' instead of the default 'greedy' [Thanks to @Botje for the comment on 'greedy matching' which gave me some hints]. Below are the changes(added comments in the changed places) did to the code from the original code posted in the question.

#include <regex> 
#include <string> 
using namespace std; 
   
int main()
{
    std::string inputStr = "test 1 2 3 4 5 abc def abc";
    std::string regexPattern = "test 1.*?abc"; // Added lazy quantifier '?' after the '.*' to make the search lazy instead of greedy
    
    regex regexp(regexPattern); // removed std::regex::grep which makes the regex POSIX ERE compliant, and that regex flavor does not support lazy quantifiers.
    smatch m; 
    
    while(std::regex_search(inputStr, m, regexp, std::regex_constants::match_default))
    {
        std::cout<<"String that matches the pattern: "<< m.str() << std::endl;
        inputStr = m.suffix();
    }
    
    return 0;
}

Useful links:

c11-regex-non-greedy what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions

bobix
  • 81
  • 8