14

After calling std::regex_search, I'm only able to get the first string result from the std::smatch for some reason:

Expression.assign("rel=\"nofollow\">(.*?)</a>");
if (std::regex_search(Tables, Match, Expression))
{
    for (std::size_t i = 1; i < Match.size(); ++i)
        std::cout << Match[i].str() << std::endl;
}

So I tried to do it another way - with an iterator:

const std::sregex_token_iterator End;
Expression.assign("rel=\"nofollow\">(.*?)</a>");
for (std::sregex_token_iterator i(Tables.begin(), Tables.end(), Expression); i != End; ++i)
{
    std::cout << *i << std::endl;
}

This does go through every match, but it also gives me the whole matching string instead of just the capture that I was after. Surely must be another way than having to do another std::regex_search on the iterator element in the loop?

Thanks in advance.

Nop
  • 315
  • 1
  • 3
  • 9

2 Answers2

10

regex_token_iterator takes an optional fourth argument specifying which submatch is returned for each iteration. The default value of this argument is 0, which in case of the C++ (and many other) regexes means "the whole match". If you want to get the first captured submatch, simply pass 1 to the constructor:

const std::sregex_token_iterator End;
Expression.assign("rel=\"nofollow\">(.*?)</a>");
for (std::sregex_token_iterator i(Tables.begin(), Tables.end(), Expression, 1); i != End; ++i)
{
    std::cout << *i << std::endl; // *i only yields the captured part
}
JohannesD
  • 13,802
  • 1
  • 38
  • 30
  • 1
    This needs more `for (auto it : ...)`. – rr- Oct 24 '15 at 12:40
  • @rr-, the foreach loop can be used in situations where the corresponding iterator loop would have the form `for(auto i = begin(container); i != end(container); ++i)`. This is not one of those cases. – JohannesD Oct 26 '15 at 17:22
  • 2
    You're right, of course, and I didn't mean to criticize your answer in any way. It's just an observation that comes from seeing how people create (nonstandard) stuff that lets you write code such as `for (auto i : range(10))`. I believe having such adapters for regex would make them more readable, and I think it's possible with some boost adapters. I certainly wouldn't complain if they were incorporated into stdlib at some point. – rr- Oct 26 '15 at 19:35
  • @rr- Ah, right. Yeah, a "regex_range" wrapper would certainly make the loop look cleaner. – JohannesD Oct 28 '15 at 21:04
  • Should it not be `const std::sregex_token_iterator End;` in the first line? Otherwise I get `error: no match for ‘operator!=’` – ph_0 May 12 '20 at 13:57
6

std::regex_search searches for the regex just once. It does not return a list of matches, but a list of submatched expressions (those within parentheses). This is why you only get one Match[1], the text inside the link tag.

As for the second code, it actually returns you all the matches, but it returns you again match_results object, so you have to use the [] operator:

const std::sregex_iterator End;
Expression.assign("rel=\"nofollow\">(.*?)</a>");
for (std::sregex_iterator i(Tables.begin(), Tables.end(), Expression); i != End; ++i)
{
    std::cout << (*i)[1] << std::endl; // first submatch, same as above.
}
Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87