-2

I am using RegEx to extract substrings in a RPN formula. For example with this formula:

10 2 / 3 + 7 4

I use this RegEx to extract substring (I hope it could return {"10", "2", "/", "3", "+", "7", "4"}

[0-9]+|[\/*+-])\s?

Firstly, I try it with Python:

s = r"([0-9]+|[\/*+-])\s?"

text = "10 2 / 3 + 7 4"

x = re.findall(s,text)

for i in x:
    print(i)

And this is the output, as I think it is.

10
2
/
3
+
7
4

However, when I use the expression in C++:

#include <bits/stdc++.h>
#include <regex>
using namespace std;
int main(){
    string text = "10 2 / 3 + 7 4"; 
    smatch m;
    regex rgx("([0-9]+|[\/*+-])\s+"); 

    regex_search(text, m, rgx);

    for (auto x : m){
        cout << x << " ";
    }
}

The compiler return two warnings in 7th line: unknown escape sequence: '/' and '\s' and it return nothing but several spaces.

I want to know what is the problem with my expression when I use it in C++?

  • 2
    Perhaps you should escape your backslashes. In Python you're using the `r` prefix to support literal backslashes in the string, but you're not doing anything to support them in the C++ version. – khelwood Jan 13 '21 at 15:13
  • Your Python string `s` is a raw string that doesn't treat '\' as an escape character. You can do something [similar](https://en.cppreference.com/w/cpp/language/string_literal) in C++, or you can escape your '\' by turning them into '\\'. – Nathan Pierson Jan 13 '21 at 15:13
  • 1
    Note that you are using a raw string (`r` prefix) in Python, but you are not doing the equivalent in C++ (`R"(...)"`). – 0x5453 Jan 13 '21 at 15:14
  • I have tried to escape my backslashes already. Although no warnings are returned, the code returns only 3 spaces as before. – Anh Nguyễn Tuấn Jan 13 '21 at 15:15
  • See [This regex doesn't work in c++](http://stackoverflow.com/questions/31098881/this-regex-doesnt-work-in-c). – Wiktor Stribiżew Jan 13 '21 at 15:16
  • 2
    Note also that [regex_search](https://en.cppreference.com/w/cpp/regex/regex_search) only finds the _first_ match in the string and is not equivalent to `findall`. Looks like you might want to look into [regex_iterator](https://en.cppreference.com/w/cpp/regex/regex_iterator) to generate all matches. – Nathan Pierson Jan 13 '21 at 15:26
  • 1
    Yes, see [How to match multiple results using std::regex](https://stackoverflow.com/questions/21667295/how-to-match-multiple-results-using-stdregex). This and the above explain all you need. – Wiktor Stribiżew Jan 13 '21 at 15:30

2 Answers2

3

\ is used as a escape sequence in C++. You have to write \ as \\ to pass it to regex engine.

    regex rgx("([0-9]+|[\\/*+-])\\s+"); 

Another option is using raw string literal (since C++11):

    regex rgx(R"(([0-9]+|[\/*+-])\s+)"); 
MikeCAT
  • 73,922
  • 11
  • 45
  • 70
0

There are two issues. First, you aren't escaping the '\' character in your C++ string equivalent to how you're doing so in Python.

Possible options:

regex rgx("([0-9]+|[\\/*+-])\\s+"); // Escaping the \s
regex rgx(R"(([0-9]+|[\/*+-])\s+)");  // Using a raw string literal

Second, the behavior of std::regex_match isn't exactly analogous to re.findall: It only finds the first subsequence that matches the target expression, not all of them.

cppreference:

In order to examine all matches within the target sequence, std::regex_search may be called in a loop, restarting each time from m[0].second of the previous call. std::regex_iterator offers an easy interface to this iteration.

std::regex_iterator

Code using the latter could look like

#include <iostream>
#include <regex>

int main() {
    std::string text = "10 2 / 3 + 7 4";
    std::regex rgx("([0-9]+|[\\/*+-])\\s+");

    auto operands_begin = std::sregex_iterator(text.begin(), text.end(), rgx);
    auto operands_end = std::sregex_iterator();

    for (std::sregex_iterator itr = operands_begin; itr != operands_end; ++itr)
    {
        std::cout << itr->str() << '\n';
    }
}

(As a minor note, I've replaced the <bits/stdc++.h> and using namespace std; with explicit includes and namespace qualifications.)

Nathan Pierson
  • 5,461
  • 1
  • 12
  • 30