2

I have a RE2 regex as following

const re2::RE2 numRegex("(([0-9]+),)+([0-9])+");
std::string inputStr;
inputStr="apple with make,up things $312,412,3.00");
RE2::Replace(&inputStr, numRegex, "$1$3");
cout << inputStr;

Expected

apple with make,up,things $3124123.00

I was trying to remove the , in the recognized number, $1 would only match 312 but not 412 part. Wondering how to extract the recursive pattern in the group.

Note that RE2 doesn't support lookahead (see Using positive-lookahead (?=regex) with re2) and the solutions I found all use lookaheads.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
MG.
  • 449
  • 6
  • 15
  • Sorry, I'm not really clear on what you want to achieve. You simply want to remove commas from numbers? I'd match on the comma-based number format, then a second pass to remove the numbers. – ggorlen Apr 07 '21 at 19:10
  • 2
    You are repeating a capture group, which will have the last value of the iteration. – The fourth bird Apr 07 '21 at 19:11
  • 1
    You just want to remove commas between digits. Use `std::regex numRegex(R"((\d),(?=\d))");` and the replace with `$1`, `regex_replace(inputStr, numRegex, "$1")`. – Wiktor Stribiżew Apr 07 '21 at 19:21

2 Answers2

1

RE2 based solution

As RE2 does not support lookarounds, there is no pure single-pass regex solution.

You can have a workaround (as usual, when no solution is available): replace the string twice with (\d),(\d) regex and $1$2 substitution:

const re2::RE2 numRegex(R"((\d),(\d))");
std::string inputStr("apple with make,up things $312,412,3.00");
RE2::Replace(&inputStr, numRegex, "$1$2");
RE2::Replace(&inputStr, numRegex, "$1$2"); // <- Second pass to remove commas in 1,2,3,4 like strings
std::cout << inputStr;

C++ std::regex based solution:

You can remove the commas between digits using

std::string inputStr("apple with make,up things $312,412,3.00");
std::regex numRegex(R"((\d),(?=\d))"); 
std::cout << regex_replace(inputStr, numRegex, "$1") << "\n";
// => apple with make,up things $3124123.00

See the C++ demo. Also, see the regex demo here.

Details:

  • (\d) - Capturing group 1 ($1): a digit
  • , - a comma
  • (?=\d) - a positive lookahead that requires a digit immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

In the pattern that you tried, you are repeating the outer group (([0-9]+),)+ which will then contain the value of the last iteration where it can match a 1+ digits and a comma.

The last iteration will capture 412, and 312, will only be matched.


You are using regex, but as an alternative if you have boost available, you could make use of the \G anchor which can get iterative matches asserting the position at the end of the previous match and replace with an empty string.

(?:\$|\G(?!^))\d+\K,(?=\d)

The pattern matches:

  • (?: Non capture group
    • \$ match $
    • | Or
    • \G(?!^) Assert the position at the end of the previous match, not at the start
  • ) Close non capture group
  • \d+\K Match 1+ digits and forget what is matched so far
  • ,(?=\d) Match a comma and assert a digit directly to the right

Regex demo

#include<iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;

int main()
{
    std::string inputStr = "apple with make,up things $312,412,3.00";
    boost::regex numRegex("(?:\\$|\\G(?!^))\\d+\\K,(?=\\d)");  
    std::string result = boost::regex_replace(inputStr, numRegex, "");
    std::cout << result << std::endl;
}

Output

apple with make,up things $3124123.00
The fourth bird
  • 154,723
  • 16
  • 55
  • 70