5

I am writing a Perl script where the user can input a regex and a replacement string. The script will search a set of files and apply changes according the perl s/// operator applied with the user input.

To complicate matters slightly, the replacement string is allowed to contain backreferences to refer to capture groups in the regex. For example, if the regex is b(.*?)a and the replacement string is a$1b the $1 should not be treated literally, but rater as a backreference to capture group number one.

In this setting, I am wondering if it is possible to use the ee modifier (to evaluate the backreferences in the user input) safely with the s/// operator when the right hand side of this operator is input by the user? For example:

use strict;
use warnings;
my $str = 'abaaca';

my $replacement = 'do{ use Env qw(HOME); unlink "$HOME/important.txt" }';

$str =~ s/a(.*?)a/$replacement/gee;

would be unfortunate.. But then I got the idea to quote the user input (put it inside a pair of double quotes) after having escaped double quotes and dollar signs (not followed by a number), and then do replacement:

use feature qw(say);
use strict;
use warnings;

my $str = 'abaaca';

my $replacement = shift;
$replacement =~ s/\"/\\\"/g;
$replacement =~ s/\$(?!\d)/\\\$/g;
$replacement = '"' . $replacement . '"';
$str =~ s/a(.*?)a/$replacement/gee;
say $str;

To me this seems to work at first glance, or have I missed something? For example if the script is called test.pl and the user runs it as:

$ test.pl 'do{ "a$b" }'

the output is as desired just a simple string ( and no code is evaluated ):

do{ "a$b" }do{ "a$b" }

So the question is: Is this really a safe/correct approach?

Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • If the user is simply running your script on a machine that they already have access to, why couldn't they just edit it to do whatever evil things they wanted? Or, for that matter, write their own Perl script that does evil things? – ThisSuitIsBlackNot Mar 17 '15 at 18:57
  • Sure, that is a good point.. but I was thinking about the case where the user by accident typed something that resulted in code executing in the `s///` operator.. and that could have undesired consequences – Håkon Hægland Mar 17 '15 at 18:59

1 Answers1

5

Problem 1:

There's no way to replace with $1 followed by 1 since the following replaces with ${1}1.

$ script '${1}1'
${1}1${1}1

Problem 2:

$ script '\${ system "echo rm -rf /" }'
rm -rf /
Use of uninitialized value in substitution iterator at a.pl line 12.
rm -rf /
Use of uninitialized value in substitution iterator at a.pl line 12.

Problem 3:

$ script '$1{ system "echo rm -rf /" }'
rm -rf /
Use of uninitialized value within %1 in string at (eval 1) line 1.
rm -rf /
Use of uninitialized value within %1 in string at (eval 2) line 1.

Surely, there are others. Solution:

Use String::Substitution.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • How does actually problem 2 work? If I type `script '$1{ say "" }'` I get an error `Use of uninitialized value within %1 in string at (eval 1) line 1.` .. I am stuck here, why do I get this error message, and why does it say `%1`? – Håkon Hægland Mar 17 '15 at 20:15
  • Maybe if you printed something other than empty line, it would be more noticeable that you printed something... – ikegami Mar 17 '15 at 20:18
  • Ok, I see it is a hash key... nice :) – Håkon Hægland Mar 17 '15 at 20:20
  • So `$h{say}=1` will set key `say` of `%h` to 1, but `$h{ my_func() }=1` will use the return value of `my_func` as a hash key.. it means that Perl does some checking to see if the key looks like code or like a bareword key.. Is there a reference to this logic in the documentation? – Håkon Hægland Mar 17 '15 at 20:31
  • 1
    It's well established that the key expression of a hash lookup can be a bareword, but it does not appear to be documented. It should be in [perldata](http://perldoc.perl.org/perldata.html) – ikegami Mar 18 '15 at 14:33
  • @Håkon Hægland, Technically, it's not undocumented; it's incorrectly documented. The docs say that literals with no other meaning are treated as string literals that return themselves, and `use strict qw( subs );` explicitly exempts "a simple identifier (no colons) and that it appears in curly braces or on the left hand side of the `=>` symbol." But that's a lie. `time` is treated as a string literal in `$foo{time}` even though it has another meaning. `$f{word}` isn't just a bareword; it's a [distinct feature](https://stackoverflow.com/a/58263207/589924). – ikegami Feb 02 '21 at 22:15