11

I'm debugging some code and wondered if there is any practical difference between $1 and \1 in Perl regex substitutions

For example:

my $package_name = "Some::Package::ButNotThis";

$package_name =~ s{^(\w+::\w+)}{$1};  

print $package_name; # Some::Package

This following line seems functionally equivalent:

$package_name =~ s{^(\w+::w+)}{\1};

Are there subtle differences between these two statements? Do they behave differently in different versions of Perl?

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
Mr Foo Bar
  • 195
  • 1
  • 6
  • Perhaps related http://stackoverflow.com/questions/2890700/backreferences-syntax-in-replacement-strings-why-dollar-sign – polygenelubricants Jun 18 '10 at 08:47
  • 3
    If you run with `use warnings` enabled (and you absolutely should), your second example will produce a warning: `\1 better written as $1 ...`. – FMc Jun 18 '10 at 11:19
  • Casual observation: the regex substitution doesn't alter the string. – Zaid Jun 17 '14 at 16:17

2 Answers2

14

First, you should always use warnings when developing:

#!/usr/bin/perl

use strict; use warnings;

my $package_name = "Some::Package::ButNotThis";

$package_name =~ s{^(\w+::\w+)}{\1};

print $package_name, "\n";

Output:

\1 better written as $1 at C:\Temp\x.pl line 7.

When you get a warning you do not understand, add diagnostics:

C:\Temp> perl -Mdiagnostics x.pl
\1 better written as $1 at x.pl line 7 (#1)
    (W syntax) Outside of patterns, backreferences live on as variables.
    The use of backslashes is grandfathered on the right-hand side of a
    substitution, but stylistically it's better to use the variable form
    because other Perl programmers will expect it, and it works better if
    there are more than 9 backreferences.

Why does it work better when there are more than 9 backreferences? Here is an example:

#!/usr/bin/perl

use strict; use warnings;

my $t = (my $s = '0123456789');
my $r = join '', map { "($_)" } split //, $s;

$s =~ s/^$r\z/\10/;
$t =~ s/^$r\z/$10/;

print "[$s]\n";
print "[$t]\n";

Output:

C:\Temp> x
]
[9]

If that does not clarify it, take a look at:

C:\Temp> x | xxd
0000000: 5b08 5d0d 0a5b 395d 0d0a                 [.]..[9]..

See also perlop:

The following escape sequences are available in constructs that interpolate and in transliterations …

\10 octal is 8 decimal. So, the replacement part contained the character code for BACKSPACE.

NB

Incidentally, your code does not do what you want: That is, it will not print Some::Package some package contrary to what your comment says because all you are doing is replacing Some::Package with Some::Package without touching ::ButNotThis.

You can either do:

($package_name) = $package_name =~ m{^(\w+::\w+)};

or

$package_name =~ s{^(\w+::\w+)(?:::\w+)*\z}{$1};
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 1
    And if you want a laugh, you'll also install [warnin's](http://wonkden.net/modules.html). :) – Ether Jun 18 '10 at 16:25
8

From perldoc perlre:

The bracketing construct "( ... )" creates capture buffers. To refer to the current contents of a buffer later on, within the same pattern, use \1 for the first, \2 for the second, and so on. Outside the match use "$" instead of "\".

The \<digit> notation works in certain circumstances outside the match. But it can potentially clash with octal escapes. This happens when the backslash is followed by more than 1 digits.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • If you have at least one capture buffer within the regex, whatever was previously stored in $1 will be overwritten. – Jordan Lewis Jun 18 '10 at 08:50
  • 1
    It should be noted that in a replacement (`s/.../.../`), the first part is a match, the second is not. – Svante Jun 18 '10 at 08:54