0

I am having a problem correctly quoting the subpattern placeholder '$1' when passing it to the replacement operator 's///' in a variable. Could someone shed some light on this and advise me, what I am doing wrong?

I am exporting as set of MS Word documents to HTML files. This works more or less OK except that the files contain many cross references and these need to be fixed to keep working. The exported references are in the form 'href="../../somefilename.docx"' and these need to be changed into 'href="somefilename.htm"' to reference the exported html files instead of the original Word files.

An example file test.htm could e.g. look like:

<html>
<body>
<a href="../../filename1.docx" />
<a href="../../filename2.docx" />
<a href="../../filename3.docx" />
<a href="../../filename4.docx" />
</body>
</html>

and the program execution should then yield:

<html>
<body>
<a href="filename1.htm" />
<a href="filename2.htm" />
<a href="filename3.htm" />
<a href="filename4.htm" />
</body>
</html>

I wrote me a little Perl program 'ReplaceURLs' to do that job. It works fine, if I "hardcode" the pattern and the replacement expressions (i.e. if I place the pattern directly into the s/.../.../g statement) - see variant 1. But to make it more flexible I would like to allow those expressions to be passed in as argument (i.e. s/$pattern/$subst/g) and this I can't get that working. I can pass in the pattern in a variable - see variant 2, but not the substition value containing the subpattern reference $1. In variant 3 for some reason the $1 in the substitution value is not recognized as a subpattern marker but treated as a literal '$'.

#!/usr/bin/perl

$debug = TRUE;
$tgtfilename = $ARGV[0] || die("usage: ReplaceURLs.pl <filename> <url-pattern> <url-substvalue>");
$urlpattern  = $ARGV[1] || "href=\"\.\./\.\./(.*)\.docx\"";  # href="../../(filename).docx';
$urlsubstval = $ARGV[2] || "href=\"\$1.htm\"";  # href="$1.htm" --> href="(filename).htm";

print "replacing all occurences of pattern '$urlpattern' in file '$tgtfilename' with '$urlsubstval':\n";

# open & read $tgtfilename
open($ifh, '<', $tgtfilename) || die "unable to open $tgtfilename for reading: $!";
@slurp = <$ifh>; 
$oldstring = "@slurp";
close($ifh)  || die "can't close file $tgtfilename: $!";
if ($debug) { print $oldstring,"\n"; }

# look for $urlpattern and replace it with $urlsubstval:

# variant 1: works
#($newstring = $oldstring) =~ s!href=\"\.\./\.\./(.*)\.docx\"!href=\"$1.htm\"!g;

# variant 2: works
#($newstring = $oldstring) =~ s!$urlpattern!href=\"$1.htm\"!g; 

# variant : does not work - why?
($newstring = $oldstring) =~ s/$urlpattern/$urlsubstval/g; 

# save file
#open($ofh, '>', $tgtfilename) || die "unable to re-open $tgtfilename for writing";
#print $ofh $newstring,"\n";
#close($ofh) || die "can't close file $tgtfilename: $!";

# done
if ($debug) { print "result of replacement:","\n", $newstring,"\n"; } else { print "done."; }
__END__

If I run this using "perl ReplaceURLs.pl test.htm" I always get:

<html>
 <body>
 <a href="$1.htm" />
 <a href="$1.htm" />
 <a href="$1.htm" />
 <a href="$1.htm" />
 </body>
 </html>

instead of the desired result. How do I need to quote or escape the '$1' in $urlsubstval to get this working?

M.

Kai
  • 38,985
  • 14
  • 88
  • 103
mmo
  • 3,897
  • 11
  • 42
  • 63
  • 2
    http://stackoverflow.com/questions/392643/how-to-use-a-variable-in-the-replacement-side-of-the-perl-substitution-operator – ysth Nov 13 '12 at 09:21

2 Answers2

2

See perlop.

Options are as with m// with the addition of the following replacement specific options:

     e   Evaluate the right side as an expression.
     ee  Evaluate the right side as a string then eval the result.
     r   Return substitution and leave the original string untouched.

So, rather obscurely,

$ ls -1 | perl -pE '$str = q{"--$1--"}; s/(hah)/$str/ee;'
bobbogo
  • 14,989
  • 3
  • 48
  • 57
  • This is a good solution, but be aware that this would allow the user to run anything they want by specifying code as the replacement pattern. For example if they specified the replacement to be `'system("rm -rf /")'`... – dan1111 Nov 13 '12 at 11:42
  • Thanks! You got me going! The 'ee' was new to me. But the important trick was the q{...} to quote the string! My program still wouldn't work until I assigned this as $urlsubstval = q{}; – mmo Nov 13 '12 at 14:31
0

The solution by bobbogo only works, if the $str does not contain anything interfering with Perl's syntax. But because I wanted the replacement to contain something that by accident happens to look like a Perl assignment, namely 'href="$1.htm"', this yielded warnings 'Unquoted string "href" may clash with future reserved word ...' as well as errors 'Use of uninitialized value in substitution iterator at ..." and then crashed.

So, my finally working solution was to instead construct the command using proper string substitution and then to eval(...) that constructed command:

#!/usr/bin/perl

$debug = 1;
$tgtfilename = $ARGV[0] || die("usage: ReplaceURLs.pl <filename> [ <url-pattern> [ <url-substvalue> ] ]");
$urlpattern  = $ARGV[1] || 'href="\.\./\.\./(.*)\.docx"';  # href="../../<filename>.docx"" in regexp format
$urlreplace  = $ARGV[2] || 'href="$1.htm"';  # href="$1.htm" --> href="<filename>.htm"; 

print "replacing all occurences of pattern '$urlpattern' in file '$tgtfilename' with '$urlreplace':\n";

# open & read $tgtfilename
open($ifh, '<', $tgtfilename) || die "unable to open $tgtfilename for reading: $!";
@slurp = <$ifh>; 
$oldstring = "@slurp";
close($ifh)  || die "can't close file $tgtfilename: $!";
if ($debug) { print $oldstring,"\n"; }

# construct command to look for $urlpattern and replace it with $urlreplace:
$newstring = $oldstring;
$cmd = '$newstring =~ s!'.$urlpattern.'!'.$urlreplace.'!g';
# execute it:
if ($debug) { print "cmd=", $cmd, "\n"; }
eval($cmd);

# done
if ($debug) { 
    print "result of replacement:","\n", $newstring,"\n"; 
} else { 
    # save to file:
    open($ofh, '>', $tgtfilename) || die "unable to re-open $tgtfilename for writing";
    print $ofh $newstring,"\n";
    close($ofh) || die "can't close file $tgtfilename: $!";
    print "done."; 
}
__END__
mmo
  • 3,897
  • 11
  • 42
  • 63