How to match multiple results using std::regex

Question

For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.

I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.

What should I do?

Here's my code:

#include<iostream>
#include<string>
using namespace std;
int main()
{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";
    regex_search(str, res, exp);
    cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}

Here's a solution:regex exp("(.*)\\b\\S*\\b"); smatch res; string str = "first second third forth"; while (regex_search(str, res, exp, regex_constants::match_any)) { cout << res[0] << endl; str = res.suffix().str(); } — AntiMoron, Feb 10 '14 at 02:50
This is the exact solution that worked for me too. Thank you! — Gregory Stein, Sep 25 '19 at 12:58

St0fF · Answer 1 · 2018-11-29T09:07:36.567

39

Simply iterate over your string while regex_searching, like this:

{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";

    string::const_iterator searchStart( str.cbegin() );
    while ( regex_search( searchStart, str.cend(), res, exp ) )
    {
        cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];  
        searchStart = res.suffix().first;
    }
    cout << endl;
}

edited Nov 29 '18 at 09:07

answered Jan 26 '16 at 23:39

St0fF

1,553
12
22

1

If res.position() is relative to the original string, surely that should be ` searchStart = str.cbegin() + match.position() + match.length();`. – Chris Kitching Mar 24 '17 at 23:26
2

That's nearly correct. You may have overseen the "+=" ;) Which leads to the fact, that `res.position()` is relative to the search, not the original string. So your words are right in case of the very first round of the loop. – St0fF Mar 26 '17 at 07:26
4

This is the only explanation that's made sense to me so far. Thanks! – awwsmm Feb 27 '18 at 16:13
3

You can also use `searchStart = res.suffix().first` to move the iterator to the first letter after the final match instead of `searchStart += res.position() + res.length()` if it's a bit clearer. – Tim MB Nov 27 '18 at 18:11
2

@TimMB thank you. It looks like it also spares 2 operations (still those ops will be done under the hood), thus I agree it looks much clearer. Hope it's OK to include your suggestion into my answer? – St0fF Nov 29 '18 at 09:06

score 26 · Accepted Answer · edited Nov 06 '22 at 12:25

26

This can be done in regex of C++11.

Two methods:

You can use () in regex to define your captures(sub expressions).

Like this:

    string var = "first second third forth";

    const regex r("(.*) (.*) (.*) (.*)");  
    smatch sm;

    if (regex_search(var, sm, r)) {
        for (int i=1; i<sm.size(); i++) {
            cout << sm[i] << endl;
        }
    }

See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7

You can use sregex_token_iterator():

 string var = "first second third forth";

 regex wsaq_re("\\s+"); 
 copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
     sregex_token_iterator(),
     ostream_iterator<string>(cout, "\n"));

See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0

edited Nov 06 '22 at 12:25

Community

1
1

answered Feb 24 '14 at 09:07

herohuyongtao

49,413
29
133
174

Which is better? Why? – Yola Feb 06 '17 at 20:38
I've tried using `smatch.size()` and switched `regex.mark_count()+1` after size was causing out-of-range errors to trigger with similar code. – kayleeFrye_onDeck Jun 27 '17 at 22:13

score 16 · Answer 3 · edited Jun 14 '20 at 18:32

sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here: http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/

For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.

// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>

int main ()
{
  std::string s ("this subject has a submarine as a subsequence");
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"

  // default constructor = end-of-sequence:
  std::regex_token_iterator<std::string::iterator> rend;

  std::cout << "entire matches:"; 
  std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
  while (a!=rend) std::cout << " [" << *a++ << "]";
  std::cout << std::endl;

  std::cout << "2nd submatches:";
  std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
  while (b!=rend) std::cout << " [" << *b++ << "]";
  std::cout << std::endl;

  std::cout << "1st and 2nd submatches:";
  int submatches[] = { 1, 2 };
  std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
  while (c!=rend) std::cout << " [" << *c++ << "]";
  std::cout << std::endl;

  std::cout << "matches as splitters:";
  std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
  while (d!=rend) std::cout << " [" << *d++ << "]";
  std::cout << std::endl;

  return 0;
}

Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]

This is beautiful! It's exactly what I was looking for, for some equivalence with Python regex.findall.. — MC-8, Dec 04 '20 at 18:32
Very nice! +1. Any idea how to get the position of the match? — Mecanik, Oct 18 '22 at 08:05

score 13 · Answer 4 · edited Mar 21 '22 at 12:26

13

You could use the suffix() function, and search again until you don't find a match:

int main()
{
    regex exp("(\\b\\S*\\b)");
    smatch res;
    string str = "first second third forth";

    while (regex_search(str, res, exp)) {
        cout << res[0] << endl;
        str = res.suffix();
    }
}

edited Mar 21 '22 at 12:26

Sabito stands with Ukraine

4,271
8
34
56

answered Aug 22 '16 at 13:16

Mattia Fantoni

889
11
15

4

This way you're reassigning str on every loop. Appears to me as a waste of time and fragmentation of heap. – St0fF Feb 28 '18 at 16:52

score 8 · Answer 5 · edited Mar 21 '22 at 12:27

8

My code will capture all groups in all matches:

vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
    regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
    vector<vector<string>> captured_groups;
    vector<string> captured_subgroups;
    const std::sregex_token_iterator end_i;
    for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
        i != end_i;
        ++i)
    {
        captured_subgroups.clear();
        string group = *i;
        smatch res;
        if(regex_search(group, res, rx))
        {
            for(unsigned i=0; i<res.size() ; i++)
                captured_subgroups.push_back(res[i]);

            if(captured_subgroups.size() > 0)
                captured_groups.push_back(captured_subgroups);
        }

    }
    captured_groups.push_back(captured_subgroups);
    return captured_groups;
}

edited Mar 21 '22 at 12:27

Sabito stands with Ukraine

4,271
8
34
56

answered May 28 '15 at 00:38

Behrouz.M

3,445
6
37
64

1

You are leaking 'rx' on exceptions. Is there any reason why you don't allocate it on the stack? auto rx { regex(reg_ex, case_sensitive ? regex_constants::icase : 0) }; – Axel Rietschin Jun 06 '16 at 22:37
1

@AxelRietschin there is no reasonable reason! that time I did not know the default value for regex flag!!! – Behrouz.M Jun 07 '16 at 07:44
2

Answer has been updated according to @AxelRietschin comment. – Behrouz.M Jun 07 '16 at 07:49

score 5 · Answer 6 · edited May 23 '17 at 10:30

5

My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression

edited May 23 '17 at 10:30

Community

1
1

answered Feb 10 '14 at 01:11

Peter Alfvin

28,599
8
68
106

Basically, if you want this functionality from `std::regex`, you have to handle in some way the splitting of the string at the end of the last match, and then re-check what's left until either at the end of the string, or no more matches are occurring. I don't have a working example, but nowadays in modern C++ you might be able to use `std::regex_token_iterator` to do the trick. http://en.cppreference.com/w/cpp/regex/regex_token_iterator – kayleeFrye_onDeck Jun 27 '17 at 22:23

How to match multiple results using std::regex

6 Answers6

Linked

Related