0

I want to split the following mathematical expression -1+33+4.4+sin(3)-2-x^2 into tokens using regex. I use the following site to test my regex expression link, this says that nothing wrong. When I implement the regex into my C++, throwing the following error Invalid special open parenthesis I looked for the solution and I find the following stackoverflow site link but it do not helped me solve my problem. My regex code is (?<=[-+*\/^()])|(?=[-+*\/^()]). In the C++ code I do not use \.

The other problem is that I do not know how to determine the minus sign is an unary operator or a binary operator, if the minus is an unary operator I want to look like this {-1}

I want the tokens looks like this : {-1,+,33,+4.4,+,sin,(,3,),-,2,-,x,^,2}

The unary minus can be anywhere in the string.

If I do not use ^ it still wrong.

code:

std::vector<std::string> split(const std::string& s, std::string rgx_str) {
      std::vector<std::string> elems;
      std::regex rgx (rgx_str);
      std::sregex_token_iterator iter(s.begin(), s.end(), rgx);
      std::sregex_token_iterator end;
      while (iter != end)  {
          elems.push_back(*iter);
          ++iter;
      }
      return elems;
}
int main() {
    std::string str = "-1+33+4.4+sin(3)-2-x^2";
    std::string reg = "(?<=[-+*/()^])|(?=[-+*/()^])";
    std::vector<std::string> s = split(str,reg);
    for(auto& a : s)
        cout << a << endl;
    return 0;
}
  • Can we assume that the unary minus can only occur at the string start? – Wiktor Stribiżew Jan 05 '21 at 22:11
  • There are several flavors of regular expressions in use. Without looking up C++'s `std::regex` implementation, it's likely that this syntax doesn't match what `std::regex` expects. Finally, on the topic of the minus sign, for this and other reasons its simply not feasible to expect to parse arbitrarily complex mathematical expressions using regular expressions alone. Real parsers employ a regex-based lexer with a separate grammar parsing phase that uses an LALR(1) parser, typically, and treats `-` as an unary operator, and implements it directly on numeric constant operands. – Sam Varshavchik Jan 05 '21 at 22:11
  • @WiktorStribiżew no the unary minus can be anywhere –  Jan 05 '21 at 22:13
  • Could the ^ sign be the issue? Isn't that a reserved character representing the end of a string? – J. Lengel Jan 05 '21 at 22:13
  • @J.Lengel if I do not use ^ it still wrong –  Jan 05 '21 at 22:15
  • Then you need to define the contexts where the `-` should be kept with a digit. – Wiktor Stribiżew Jan 05 '21 at 22:19
  • What happens if you prepend all reserved characters (?, +, *, (, ), ^, etc.) with a backslash? – J. Lengel Jan 05 '21 at 22:21
  • Given you are trying to match them and they are not part of the syntax – J. Lengel Jan 05 '21 at 22:21
  • If I just use (?<=[-+*/])|(?=[-+*/]) it still wrong, and still the same error –  Jan 05 '21 at 22:25
  • Now I solved the unary minus problem, so that is not the problem. I do not have to use - sign at the positive lookbehind part –  Jan 05 '21 at 22:28
  • You can't do this with regular expressions. You need to write a proper scanner and expression parser. – user207421 Jan 05 '21 at 23:22
  • @user207421 I do not think that, now I know how to split the string, Currently the only problem is the error. –  Jan 05 '21 at 23:28
  • Currently the problem is that you are using the wrong tool for the job. – user207421 Jan 05 '21 at 23:40
  • I agree. While this does not answer your question, I would suggest using some recursive parser (That way you can read and parse the string at the same time) – J. Lengel Jan 06 '21 at 11:26

1 Answers1

0

C++ uses a modified ECMAScript regular expression grammar for its std::regex by default. It does support lookaheads (?=) and (?!), but not lookbehinds. So, the (?<=) is not a valid std::regex syntax.

There is a proposal to add this in C++23, but it is not currently implemented.

heap underrun
  • 1,846
  • 1
  • 18
  • 22