38

For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.

I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.

What should I do?

Here's my code:

#include<iostream>
#include<string>
using namespace std;
int main()
{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";
    regex_search(str, res, exp);
    cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}   
AntiMoron
  • 1,286
  • 1
  • 11
  • 29
  • 2
    Here's a solution:regex exp("(.*)\\b\\S*\\b"); smatch res; string str = "first second third forth"; while (regex_search(str, res, exp, regex_constants::match_any)) { cout << res[0] << endl; str = res.suffix().str(); } – AntiMoron Feb 10 '14 at 02:50
  • This is the exact solution that worked for me too. Thank you! – Gregory Stein Sep 25 '19 at 12:58

6 Answers6

39

Simply iterate over your string while regex_searching, like this:

{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";

    string::const_iterator searchStart( str.cbegin() );
    while ( regex_search( searchStart, str.cend(), res, exp ) )
    {
        cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];  
        searchStart = res.suffix().first;
    }
    cout << endl;
}
St0fF
  • 1,553
  • 12
  • 22
  • 1
    If res.position() is relative to the original string, surely that should be ` searchStart = str.cbegin() + match.position() + match.length();`. – Chris Kitching Mar 24 '17 at 23:26
  • 2
    That's nearly correct. You may have overseen the "+=" ;) Which leads to the fact, that `res.position()` is relative to the search, not the original string. So your words are right in case of the very first round of the loop. – St0fF Mar 26 '17 at 07:26
  • 4
    This is the only explanation that's made sense to me so far. Thanks! – awwsmm Feb 27 '18 at 16:13
  • 3
    You can also use `searchStart = res.suffix().first` to move the iterator to the first letter after the final match instead of `searchStart += res.position() + res.length()` if it's a bit clearer. – Tim MB Nov 27 '18 at 18:11
  • 2
    @TimMB thank you. It looks like it also spares 2 operations (still those ops will be done under the hood), thus I agree it looks much clearer. Hope it's OK to include your suggestion into my answer? – St0fF Nov 29 '18 at 09:06
26

This can be done in regex of C++11.

Two methods:

  1. You can use () in regex to define your captures(sub expressions).

Like this:

    string var = "first second third forth";

    const regex r("(.*) (.*) (.*) (.*)");  
    smatch sm;

    if (regex_search(var, sm, r)) {
        for (int i=1; i<sm.size(); i++) {
            cout << sm[i] << endl;
        }
    }

See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7

  1. You can use sregex_token_iterator():

     string var = "first second third forth";
    
     regex wsaq_re("\\s+"); 
     copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
         sregex_token_iterator(),
         ostream_iterator<string>(cout, "\n"));
    

See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0

Community
  • 1
  • 1
herohuyongtao
  • 49,413
  • 29
  • 133
  • 174
16

sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here: http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/

For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.

// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>

int main ()
{
  std::string s ("this subject has a submarine as a subsequence");
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"

  // default constructor = end-of-sequence:
  std::regex_token_iterator<std::string::iterator> rend;

  std::cout << "entire matches:"; 
  std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
  while (a!=rend) std::cout << " [" << *a++ << "]";
  std::cout << std::endl;

  std::cout << "2nd submatches:";
  std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
  while (b!=rend) std::cout << " [" << *b++ << "]";
  std::cout << std::endl;

  std::cout << "1st and 2nd submatches:";
  int submatches[] = { 1, 2 };
  std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
  while (c!=rend) std::cout << " [" << *c++ << "]";
  std::cout << std::endl;

  std::cout << "matches as splitters:";
  std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
  while (d!=rend) std::cout << " [" << *d++ << "]";
  std::cout << std::endl;

  return 0;
}

Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]
Ardent Coder
  • 3,777
  • 9
  • 27
  • 53
Steven
  • 2,054
  • 1
  • 18
  • 13
13

You could use the suffix() function, and search again until you don't find a match:

int main()
{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";

    while (regex_search(str, res, exp)) {
        cout << res[0] << endl;
        str = res.suffix();
    }
}   
Mattia Fantoni
  • 889
  • 11
  • 15
  • 4
    This way you're reassigning str on every loop. Appears to me as a waste of time and fragmentation of heap. – St0fF Feb 28 '18 at 16:52
8

My code will capture all groups in all matches:

vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
    regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
    vector<vector<string>> captured_groups;
    vector<string> captured_subgroups;
    const std::sregex_token_iterator end_i;
    for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
        i != end_i;
        ++i)
    {
        captured_subgroups.clear();
        string group = *i;
        smatch res;
        if(regex_search(group, res, rx))
        {
            for(unsigned i=0; i<res.size() ; i++)
                captured_subgroups.push_back(res[i]);

            if(captured_subgroups.size() > 0)
                captured_groups.push_back(captured_subgroups);
        }

    }
    captured_groups.push_back(captured_subgroups);
    return captured_groups;
}
Behrouz.M
  • 3,445
  • 6
  • 37
  • 64
  • 1
    You are leaking 'rx' on exceptions. Is there any reason why you don't allocate it on the stack? auto rx { regex(reg_ex, case_sensitive ? regex_constants::icase : 0) }; – Axel Rietschin Jun 06 '16 at 22:37
  • 1
    @AxelRietschin there is no reasonable reason! that time I did not know the default value for regex flag!!! – Behrouz.M Jun 07 '16 at 07:44
  • 2
    Answer has been updated according to @AxelRietschin comment. – Behrouz.M Jun 07 '16 at 07:49
5

My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression

Community
  • 1
  • 1
Peter Alfvin
  • 28,599
  • 8
  • 68
  • 106
  • Basically, if you want this functionality from `std::regex`, you have to handle in some way the splitting of the string at the end of the last match, and then re-check what's left until either at the end of the string, or no more matches are occurring. I don't have a working example, but nowadays in modern C++ you might be able to use `std::regex_token_iterator` to do the trick. http://en.cppreference.com/w/cpp/regex/regex_token_iterator – kayleeFrye_onDeck Jun 27 '17 at 22:23