0

This is kind of continuation of the previous SO question and its discussion.

Different Between std::regex_match & std::regex_search

In my SO question, the following regex was written to fetch the day from the given input string:

std::string input{ "Mon Nov 25 20:54:36 2013" };
//Day:: Exactly Two Number surrounded by spaces in both side
std::regex  r{R"(\s\d{2}\s)"};

In one of the answer, it was changed as R"(.*?\s(\d{2})\s.*)" to create and hence capture group and first sub-match. Everything works fine for parsing the day information using either regex_match orregex_search.

Now I wrote the following regex expressions to parse various thing from the above input string as follows:

std::string input{ "Mon Nov 25 20:54:36 2013" };


   //DayStr:: Exactly Three Letter at the start and followed by spaces(Output: Mon)
    std::regex   dayStrPattern{ R"(^\w{3}\s)" };
    //Day:: Exactly Two Number surrounded by spaces in both side(Output: 25)
    std::regex   dayPattern{ R"(\s\d{2}\s)" };
    //Month:: Exactly Three letter surrounded by spaces in both side(Output: Nov)
    std::regex   monthPattern{ R"(\s\w{3}\s)" };
    //Year:: Exactly Four Number at the end of the string(Output: 2013)
    std::regex   yearPattern{ R"(\s\d{4}$)" };
    //Hour:: Exactly two Number surrounded by spaces in left side and : in right side(Output:20)
    std::regex   hourPattern{ R"(\s\d{2}:{1})" };
    //Min:: Exactly two Number sorruounded by : in left side and : in right side(Output: 54)
    std::regex   minPattern{ R"(:{1}\d{2}:{1})" };
    //Second::Exactly two Number surrounded by : in the left side and space in right side(Output: 36)
    std::regex   secPattern{ R"(:{1}\d{2}\s)" };

I have tested the above regex here and they seems to be correct.

Now can we use the grouping mechanism here so that we pass a single regex expression in the method std::regex_search instead of 7 different regex.?. This way std::regex_search would store the output into its std::smatch sub-match vector. Is it possible over here?. I read documentation and A Tour Of C++ book but did not get understand much about regular expression grouping.

In general when and how we should use/design grouping so that we get various information in one call of std::regex_search?

At this point I have to call 7 times std::regex_search with different regex expression to fetch various information and then use it. I certainty think there is better way to achieve it than what i am doing right now.

Community
  • 1
  • 1
Mantosh Kumar
  • 5,659
  • 3
  • 24
  • 48
  • Can we make any assumption about the ordering of the date fields? Without such assumption, your current method might be better than single regex solution. – nhahtdh Nov 03 '14 at 02:22
  • @nhahtdh:yes we can assume that ordering would be in this way only. The main idea over here is to understand when(not just this example) and how to use grouping in regex. – Mantosh Kumar Nov 03 '14 at 02:25
  • There's an error in the question. I suggested adding parentheses around `\d{2}` to create a capture group. So the `regex` in my answer was `R"(.*?\s(\d{2})\s.*)"` (notice the extra parentheses). Your example code has no capture groups defined. – Praetorian Nov 03 '14 at 02:38

1 Answers1

2

There's no need to call regex_match 7 times to match 7 parts of the same input, just create multiple capture groups instead of a single one each time. For example, change your regex to

std::regex r{R"(^(\w{3}) (\w{3}) (\d{2}) (\d{2}):(\d{2}):(\d{2}) (\d{4})$)"};

And then all the matches can be obtained through match_results after a single call to regex_match

if (std::regex_match(input,match,r)){
    for(auto const& m : match) {
        std::cout << m << '\n';
    }
}

Live demo

Praetorian
  • 106,671
  • 19
  • 240
  • 328
  • Does order of while grouping should be in sync with the input?. Could you please explain bit about this concept more? – Mantosh Kumar Nov 03 '14 at 02:29
  • 1
    @MantoshKumar Yes, the fields in the input always need to be in the order shown. Also, you may want to change the digit captures to `(\d{1,2})` to deal with single digit dates and times (for instance, if it's the first of the month and your input is not zero padded). – Praetorian Nov 03 '14 at 02:33
  • @MantoshKumar Not sure what explanation you're looking for. You wanted to extract 7 fields from the input, so I created 7 capture groups, one for each. There are 8 `match_results` outputs because the zeroth element is always the entire match. The remaining (1-7) are the fields from the input string in the same order as the capture groups in the `regex`. – Praetorian Nov 03 '14 at 02:36
  • 1
    You got 8 match_results because it has 8 parenthesis pairs. The 8 match_results are in the order as the order their LEFT parenthesis appears. – Robin Hsu Nov 03 '14 at 06:14
  • @Praetorian: Thanks for excellent information and analysis. You explained about all my doubts in SO question. – Mantosh Kumar Nov 03 '14 at 07:05
  • @RobinHsu There are only 7 sets of parentheses pertinent to the regular expression. The outer one around the entire expression is part of the raw string literal, and not the regex. There are 8 results because, as I explained in the earlier comment, the zeroth element of the `match_results` is always the entire match. The captures are present from index 1 onwards. – Praetorian Nov 03 '14 at 16:39