I am having a problem correctly quoting the subpattern placeholder '$1' when passing it to the replacement operator 's///' in a variable. Could someone shed some light on this and advise me, what I am doing wrong?
I am exporting as set of MS Word documents to HTML files. This works more or less OK except that the files contain many cross references and these need to be fixed to keep working. The exported references are in the form 'href="../../somefilename.docx"' and these need to be changed into 'href="somefilename.htm"' to reference the exported html files instead of the original Word files.
An example file test.htm could e.g. look like:
<html>
<body>
<a href="../../filename1.docx" />
<a href="../../filename2.docx" />
<a href="../../filename3.docx" />
<a href="../../filename4.docx" />
</body>
</html>
and the program execution should then yield:
<html>
<body>
<a href="filename1.htm" />
<a href="filename2.htm" />
<a href="filename3.htm" />
<a href="filename4.htm" />
</body>
</html>
I wrote me a little Perl program 'ReplaceURLs' to do that job. It works fine, if I "hardcode" the pattern and the replacement expressions (i.e. if I place the pattern directly into the s/.../.../g statement) - see variant 1. But to make it more flexible I would like to allow those expressions to be passed in as argument (i.e. s/$pattern/$subst/g) and this I can't get that working. I can pass in the pattern in a variable - see variant 2, but not the substition value containing the subpattern reference $1. In variant 3 for some reason the $1 in the substitution value is not recognized as a subpattern marker but treated as a literal '$'.
#!/usr/bin/perl
$debug = TRUE;
$tgtfilename = $ARGV[0] || die("usage: ReplaceURLs.pl <filename> <url-pattern> <url-substvalue>");
$urlpattern = $ARGV[1] || "href=\"\.\./\.\./(.*)\.docx\""; # href="../../(filename).docx';
$urlsubstval = $ARGV[2] || "href=\"\$1.htm\""; # href="$1.htm" --> href="(filename).htm";
print "replacing all occurences of pattern '$urlpattern' in file '$tgtfilename' with '$urlsubstval':\n";
# open & read $tgtfilename
open($ifh, '<', $tgtfilename) || die "unable to open $tgtfilename for reading: $!";
@slurp = <$ifh>;
$oldstring = "@slurp";
close($ifh) || die "can't close file $tgtfilename: $!";
if ($debug) { print $oldstring,"\n"; }
# look for $urlpattern and replace it with $urlsubstval:
# variant 1: works
#($newstring = $oldstring) =~ s!href=\"\.\./\.\./(.*)\.docx\"!href=\"$1.htm\"!g;
# variant 2: works
#($newstring = $oldstring) =~ s!$urlpattern!href=\"$1.htm\"!g;
# variant : does not work - why?
($newstring = $oldstring) =~ s/$urlpattern/$urlsubstval/g;
# save file
#open($ofh, '>', $tgtfilename) || die "unable to re-open $tgtfilename for writing";
#print $ofh $newstring,"\n";
#close($ofh) || die "can't close file $tgtfilename: $!";
# done
if ($debug) { print "result of replacement:","\n", $newstring,"\n"; } else { print "done."; }
__END__
If I run this using "perl ReplaceURLs.pl test.htm" I always get:
<html>
<body>
<a href="$1.htm" />
<a href="$1.htm" />
<a href="$1.htm" />
<a href="$1.htm" />
</body>
</html>
instead of the desired result. How do I need to quote or escape the '$1' in $urlsubstval to get this working?
M.