60

I would like to do the following:

$find = "start (.*) end";
$replace = "foo \1 bar";

$var = "start middle end";
$var =~ s/$find/$replace/;

I would expect $var to contain "foo middle bar", but it does not work. Neither does:

$replace = 'foo \1 bar';

Somehow I am missing something regarding the escaping.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Manu
  • 28,753
  • 28
  • 75
  • 83

9 Answers9

90

On the replacement side, you must use $1, not \1.

And you can only do what you want by making replace an evalable expression that gives the result you want and telling s/// to eval it with the /ee modifier like so:

$find="start (.*) end";
$replace='"foo $1 bar"';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n";

To see why the "" and double /e are needed, see the effect of the double eval here:

$ perl
$foo = "middle";
$replace='"foo $foo bar"';
print eval('$replace'), "\n";
print eval(eval('$replace')), "\n";
__END__
"foo $foo bar"
foo middle bar

(Though as ikegami notes, a single /e or the first /e of a double e isn't really an eval(); rather, it tells the compiler that the substitution is code to compile, not a string. Nonetheless, eval(eval(...)) still demonstrates why you need to do what you need to do to get /ee to work as desired.)

ysth
  • 96,171
  • 6
  • 121
  • 214
  • 3
    Nice example of double evaluation! – PolyThinker Dec 25 '08 at 13:24
  • 3
    That's a very nice explanation of the double evaluation :) – brian d foy Dec 25 '08 at 19:46
  • 3
    Do of course note that eval is really dangerous for web apps, especially given arbitrary strings that can't be filtered. Please see my comments for why I saw the eval way to do it and then decided not to tell the user about it!. – Kent Fredric Dec 26 '08 at 04:40
  • 1
    @Kent Fredric: Yes, absolutely there is danger if $foo or $replace come from user input, but that didn't seem likely to me from the question. And (as I see you point out) taint mode will prevent an unvetted $replace from being used. – ysth Dec 26 '08 at 21:41
  • I attempted this with `$find=shift; $replace=shift; s/$find/$replace/e for @ARGV;` with a few variations: quoting (append or sprintf) when assigning to `$replace`, `s/$find/'"$replace"'/ee`, and a few others. The 1st 2 worked, the 3rd did not ... why? – Brian Vandenberg Jun 20 '13 at 21:48
  • @BrianVandenberg: I don't understand from that what you tried or why it didn't work; are you sure you are getting your `$` through the shell's interpolation and into `@ARGV`? – ysth Jun 20 '13 at 22:14
  • Re "That's a very nice explanation of the double evaluation". Except it's completely wrong. `s/.../$replace/ee` only does one `eval()`. The RHS of `s/.../$replace/` is equivalent to `qq/$replace/`. Add one `e` and it becomes equivalent to `$replace`. Add a second `e` and it becomes equivalent to `my $rv = eval($replace); die $@ if $@; $rv` – ikegami Nov 24 '15 at 15:31
  • @ikegami: I can kind of see what you are saying if I think about limited cases from a funny angle. but think about `s/42/6*9/e`; there is certainly what I would call an eval there – ysth Nov 24 '15 at 17:52
  • Nope, no `eval()`. No data is turned into code. `s/42/6*9/` ⇒ `say qq/6*9/;` ||| `s/42/6*9/e` ⇒ `say 6*9;` ||| `s/42/6*9/ee` ⇒ `say eval(6*9);`. That's why `/e` is safe, but `/ee` isn't. You can test this [here](http://pastebin.com/dyk5BeWz). – ikegami Nov 24 '15 at 18:35
  • I never said there was "an `eval()`" or that data is turned into code. The /e changes whether the replacement in the source is data or code, as you say. The first eval() in my example reflects what the compiler is doing, not what happens at runtime. Feel free to edit it if you think you can make it better. – ysth Nov 24 '15 at 19:23
  • @ikegami added a note to the end of the answer – ysth Mar 14 '19 at 17:59
13

Deparse tells us this is what is being executed:

$find = 'start (.*) end';
$replace = "foo \cA bar";
$var = 'start middle end';
$var =~ s/$find/$replace/;

However,

 /$find/foo \1 bar/

Is interpreted as :

$var =~ s/$find/foo $1 bar/;

Unfortunately it appears there is no easy way to do this.

You can do it with a string eval, but thats dangerous.

The most sane solution that works for me was this:

$find = "start (.*) end"; 
$replace = 'foo \1 bar';

$var = "start middle end"; 

sub repl { 
    my $find = shift; 
    my $replace = shift; 
    my $var = shift;

    # Capture first 
    my @items = ( $var =~ $find ); 
    $var =~ s/$find/$replace/; 
    for( reverse 0 .. $#items ){ 
        my $n = $_ + 1; 
        #  Many More Rules can go here, ie: \g matchers  and \{ } 
        $var =~ s/\\$n/${items[$_]}/g ;
        $var =~ s/\$$n/${items[$_]}/g ;
    }
    return $var; 
}

print repl $find, $replace, $var; 

A rebuttal against the ee technique:

As I said in my answer, I avoid evals for a reason.

$find="start (.*) end";
$replace='do{ print "I am a dirty little hacker" while 1; "foo $1 bar" }';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n";

this code does exactly what you think it does.

If your substitution string is in a web application, you just opened the door to arbitrary code execution.

Good Job.

Also, it WON'T work with taints turned on for this very reason.

$find="start (.*) end";
$replace='"' . $ARGV[0] . '"';

$var = "start middle end";
$var =~ s/$find/$replace/ee;

print "var: $var\n"


$ perl /tmp/re.pl  'foo $1 bar'
var: foo middle bar
$ perl -T /tmp/re.pl 'foo $1 bar' 
Insecure dependency in eval while running with -T switch at /tmp/re.pl line 10.

However, the more careful technique is sane, safe, secure, and doesn't fail taint. ( Be assured tho, the string it emits is still tainted, so you don't lose any security. )

Kent Fredric
  • 56,416
  • 14
  • 107
  • 150
  • 1
    The easy way is ysth's answer. :) – brian d foy Dec 25 '08 at 19:45
  • 2
    It depends on from where the data that's evaluated comes. Avoiding eval is generally a good idea. – PEZ Dec 27 '08 at 10:35
  • 3
    No, avoiding eval is not generally a good idea. Using it only with care is. – ysth Dec 28 '08 at 07:08
  • 3
    Telling new users to use eval, however, is not advisable. – Kent Fredric Dec 28 '08 at 08:00
  • Thanks for sharing the `repl` sub routine! This helped me.. I assume you use `reverse` in `reverse 0 .. $#items` in order to cope with mixed one-digit and two-digit numbers like `$12` and `$1`? – Håkon Hægland Mar 14 '15 at 09:37
  • @Håkon I believe so. My code does inherently also suffer a potential cross reference problem, say for instance $12 contains $1, ... That won't be preserved. Though code with that issue should be incredibly rare – Kent Fredric Mar 14 '15 at 12:36
  • Hi Kent, I think I solved that problem and I also added some more features to your code to make it more robust and also allow the user to escape the dollar sign in the replacement string. The link is here: https://github.com/hakonhagland/perl_regex_substitute – Håkon Hægland Mar 15 '15 at 17:43
8

As others have suggested, you could use the following:

my $find = 'start (.*) end';
my $replace = 'foo $1 bar';   # 'foo \1 bar' is an error.
my $var = "start middle end";
$var =~ s/$find/$replace/ee;

The above is short for the following:

my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ eval($replace) /e;

I prefer the second to the first since it doesn't hide the fact that eval(EXPR) is used. However, both of the above silence errors, so the following would be better:

my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ my $r = eval($replace); die $@ if $@; $r /e;

But as you can see, all of the above allow for the execution of arbitrary Perl code. The following would be far safer:

use String::Substitution qw( sub_modify );

my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
sub_modify($var, $find, $replace);
ikegami
  • 367,544
  • 15
  • 269
  • 518
7
# perl -de 0
$match="hi(.*)"
$sub='$1'
$res="hi1234"
$res =~ s/$match/$sub/gee
p $res
  1234

Be careful, though. This causes two layers of eval to occur, one for each e at the end of the regex:

  1. $sub --> $1
  2. $1 --> final value, in the example, 1234
eruciform
  • 7,680
  • 1
  • 35
  • 47
  • 1
    As in your example, note that the assignment of `$sub='$1'` must be exactly that. `$sub='\1'` is interpreted as a reference, and `$sub="$1"` attempts performs variable interpolation. The OP is probably better severed by some form of template library at the end of the day IMHO, but still interesting example. Thanks. – dawg Jul 17 '10 at 01:37
  • This only happens to works by accident because $sub does not contain anything interfering with Perl's syntax. But assume e.g. that I want $sub to contain some string that happens to looks like an assignment, e.g. "result=$1" (i.e. attempting to print out "result=1234"). Then you will get a warning 'unquoted string "result" may clash with future reserved word at ...' plus an error 'use of uninitialized value in substitution iterator at ...' and your program will crash. So, a solution that allows to define an arbitrary $sub containing the placeholder $1 in some arbitary postion is still missing! – mmo Nov 13 '12 at 23:20
1

I would suggest something like:

$text =~ m{(.*)$find(.*)};
$text = $1 . $replace . $2;

It is quite readable and seems to be safe. If multiple replace is needed, it is easy:

while ($text =~ m{(.*)$find(.*)}){
     $text = $1 . $replace . $2;
}
sth
  • 222,467
  • 53
  • 283
  • 367
1
#!/usr/bin/perl

$sub = "\\1";
$str = "hi1234";
$res = $str;
$match = "hi(.*)";
$res =~ s/$match/$1/g;

print $res

This got me the '1234'.

rmk
  • 4,395
  • 3
  • 27
  • 32
  • the whole point is though that I want $match and $sub to be arbitrary strings so that $sub can contain \1 with the same meaning – ldog Jul 17 '10 at 00:06
  • 3
    Can you explain your question a bit more? It's not clear what you want to achieve here... – rmk Jul 17 '10 at 00:09
1

See THIS previous SO post on using a variable on the replacement side of s///in Perl. Look both at the accepted answer and the rebuttal answer.

What you are trying to do is possible with the s///ee form that performs a double eval on the right hand string. See perlop quote like operators for more examples.

Be warned that there are security impilcations of evaland this will not work in taint mode.

Community
  • 1
  • 1
dawg
  • 98,345
  • 23
  • 131
  • 206
  • +1: cool, i didn't see the dup. you're right, this should be closed and collated... – eruciform Jul 17 '10 at 02:03
  • 3
    What am I missing? The links appear to be to this question and some of its answers. Was there indeed an earlier question that no longer exists? (Easy for me to ask six years after the fact, right? ;) ) – cxw Dec 15 '16 at 17:00
1

I did not manage to make the most popular answers work.

  • The ee method complained when my replacement string contained several consecutive backreferences.
  • Kent Fredric's answer only replaced the first match, and I need my search and replace to be global. I did not figure out a way to make it replace all matches that didn't cause other issues. For example, I tried running the method recursively until it no longer caused the string to change, but that causes an infinite loop if the replacement string contains the search string, whereas a regular global replacement does not do that.

I attempted to come up with a solution of my own using plain old eval:

eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';

Of course, this allows for code injection. But as far as I know, the only way to escape the regex query and inject code is to insert two forward slashes in $find or one in $replace, followed by a semi-colon, after which you can add add code. For example, if I set the variables this way:

my $find = 'foo';
my $replace = 'bar/; print "You\'ve just been hacked!\n"; #';

The evaluated code is this:

$var =~ s/foo/bar/; print "You've just been hacked!\n"; #/gsu;';

So what I do is make sure the strings don't contain any unescaped forward slashes.

First, I copy the strings into dummy strings.

my $findTest = $find;
my $replaceTest = $replace;

Then, I remove all escaped backslashes (backslash pairs) from the dummy strings. This allows me to find forward slashes that are not escaped, without falling into the trap of considering a forward slash escaped if it's preceded by an escaped backslash. For example: \/ contains an escaped forward slash, but \\/ contains a literal forward slash, because the backslash is escaped.

$findTest =~ s/\\\\//gmu;
$replaceTest =~ s/\\\\//gmu;

Now if any forward slash that is not preceded by a backslash remains in the strings, I throw a fatal error, as that would allow the user to insert arbitrary code.

if ($findTest =~ /(?<!\\)\// || $replaceTest =~ /(?<!\\)\//)
{
  print "String must not contain unescaped slashes.\n";
  exit 1;
}

Then I eval.

eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';

I'm not an expert at preventing code injection, but I'm the only one using my script, so I'm content using this solution without fully knowing if it's vulnerable. But as far as I know, it may be, so if anyone knows if there is or isn't any way to inject code into this, please provide your insight in a comment.

Quote
  • 183
  • 10
-6

I'm not certain on what it is you're trying to achieve. But maybe you can use this:

$var =~ s/^start/foo/;
$var =~ s/end$/bar/;

I.e. just leave the middle alone and replace the start and end.

PEZ
  • 16,821
  • 7
  • 45
  • 66