23

My regex_replace expression uses group $1 right before a '0' character in the replacement string like so:

#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main() {
    regex regex_a( "(.*)bar(.*)" );
    cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;
    cout << regex_replace( "foobar0x1", regex_a, "$1 0xNUM" ) << endl;
}

The output is:

xNUM
foo 0xNUM

I'm trying to get output foo0xNUM without the middle whitespace.

How do I guard the group name $1 from the next character in the substitution string?

Praetorian
  • 106,671
  • 19
  • 240
  • 328
srking
  • 4,512
  • 1
  • 30
  • 46
  • 3
    This is infuriating. My first idea was to use ${1}, but this is not supported. My next idea was to use a named capturing group - which is not supported. I'm really curious about the correct answer. – timgeb Apr 22 '15 at 23:11
  • @timgeb - yes, i'm migrating from boost::regex where ${1} worked fine. – srking Apr 22 '15 at 23:17

3 Answers3

22

You are allowed to either specify $n or $nn to reference captured text, thus you can use the $nn format (here $01) to avoid grabbing the 0.

cout << regex_replace( "foobar0x1", regex_a, "$010xNUM" ) << endl;
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Guvante
  • 18,775
  • 1
  • 33
  • 64
9

Guvante has provided a solution to this problem.

However, is the behavior well-defined according to the specification?

To start from the conclusion. Yes, the solution has well-defined behavior.

C++ specification

The documentation of format_default, which specifies ECMA rules to interpret the format string, points to Section 15.5.4.11 of ECMA-262.

ECMA-262 specification

According to Table 22 in Section 15.5.4.11 of ECMA-262 specification

$n

The nth capture, where n is a single digit in the range 1 to 9 and $n is not followed by a decimal digit. If n ≤ m and the nth capture is undefined, use the empty String instead. If n > m, the result is implementation-defined.

$nn

The nnth capture, where nn is a two-digit decimal number in the range 01 to 99. If nn ≤ m and the nnth capture is undefined, use the empty String instead. If nn > m, the result is implementation-defined.

The variable m is defined in previous paragraph in the same section:

[...] Let m be the number of left capturing parentheses in searchValue (using NcapturingParens as specified in 15.10.2.1).

Replacement string in the question "$10xNUM"

Back at the code in the question:

cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;

Since $1 is followed by 0, it has to be interpreted as the second rule $nn, as the first rule forbids any digit to follow $n. However, since the pattern only has 2 capturing groups (m = 2) and 10 > 2, the behavior is implementation-defined according to the specification.

We can see the effect of the implementation-defined clause by comparing the result of functionally equivalent JavaScript code in Firefox 37.0.1:

> "foobar0x1".replace(/(.*)bar(.*)/g, "$10xNUM" )
< "foo0xNUM"

As you can see, Firefox decided to interpret $10 as taking the value of the first capturing group $1, then followed by the fixed string 0. This is a valid implementation according to the specification, under the condition in $nn clause.

Replacement string in Guvante's answer: "$010xNUM"

Same as above, $nn clause is used, since $n clause forbids any digit to follow. Since 01 in $01 is less than the number of capturing groups (m = 2), the behavior is well-defined, which is to use the content of capturing group 1 in the replacement.

Therefore, Guvante's answer will return the same result on any complaint C++ compiler.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
1

I tried to find a method of simply escaping the space or something so it wouldn't print, but I was unable to.

However, the bit you are trying to add in, could be simply appended to the end of the regex output:

cout << regex_replace( "foobar0x1", regex_a, "$1" ) << "0xNUM" << endl;

The above line would give you the output you want.

Zara Kay
  • 164
  • 1
  • 8