3

Just for fun, I'm very new to Perl and I'm trying to write a simple text processing tool, but I'm stuck in a simple thing. The rules of the tool, read from a simple text file (not from the script, and that's probably the crucial thing), are a simple array of pattern/replace pairs to process a text file (like process each rule for each line). Here's the sub to apply transformations:

my ($text, @rules) = @_;
my @lines = split(/\n/, $text);
foreach ( @rules ) {
    my $pattern = $_->{"pattern"};
    my $replace = $_->{"replace"};
    $lines = map {
        $_ =~ s/$pattern/$replace/g;
    } @lines;
}
return join("\n", @lines);

For instance, if there's a rule like pattern=[aeiou] + replace=*, then text Foo bar is processed into F** b*r. That's what I want.

However I can't see why I can't use capture groups to replace text content. Let's say, pattern=([fF]) + replace=<$1> results into <$1>oo bar, but I'm expecting <F>oo bar. I guess I'm missing a very simple thing. What am I missing?

UPDATE:

After some experiments my finish result is:

sub escapeSubstLiteral {
    my ($literal) = @_;
    $literal =~ s/\//\\\//g;
    $literal;
}

sub subst {
    my ($pattern, $replace, $modifiers) = @_;
    $modifiers ||= '';
    my $expression = '$text =~ s/' . escapeSubstLiteral($pattern) . '/' . escapeSubstLiteral($replace) . '/' . $modifiers;
    return sub {
        my ($text) = @_;
        eval $expression;
        $text;
    };
}

$customSubst = subst($pattern, $replace, $modifiersToken);
$foo = $customSubst->($foo);
$bar = $customSubst->($bar);
Lyubomyr Shaydariv
  • 20,327
  • 12
  • 64
  • 105
  • http://stackoverflow.com/questions/392643/how-to-use-a-variable-in-the-replacement-side-of-the-perl-substitution-operator – Ashalynd Jul 26 '14 at 21:53
  • @Ashalynd yes, I saw that post. `/e` seems to have no effect and duplicates the above behavior. Doubling the `e` modifiers turns into `oo bar` (no `F`). – Lyubomyr Shaydariv Jul 26 '14 at 21:56
  • @jm666 I'm still confused if I understood what you mean. Please correct me if I'm wrong: does `$1` (and so on) mean a special literal thing in the text replacement scope and probably cannot be substituted with a variable? And, as a result, for example, `my $REPLACE='"(beep: $1)"'; return $INPUT =~ s/$PATTERN/$REPLACE/eegr;` is able to return correct substitution because it's just evaluated twice as a script (as if I could use `s/$PATTERN/(beep: $1)/gr`)? – Lyubomyr Shaydariv Jul 26 '14 at 23:17
  • You could add `"` at the front and end of every replace part of the rule and use `/geer` but then it wouldn't work if you had a `"` in the replace rule already. `my $replace = '"' . $_->{"replace"} . '"';` – hmatt1 Jul 26 '14 at 23:50

2 Answers2

2

If your replacement string contains capture variables then you need to evaluate it as a string, so it needs to be enclosed in double quotes and the substitution needs to do a double eval. If you first escape any double quotes that are already in there then it will work that way regardless of whether there are any capture variables in there.

Something like this should suit you. By the way I'm not sure how useful it is to split the string into lines before doing the substitution as without an /s modifier it will make a difference only for very obscure patterns.

use strict;
use warnings;
use 5.010;

my @rules = (
  {
    pattern => '[aeiou]',
    replace => '*', 
  },
  {
    pattern => '([fF])',
    replace => '<$1>',
  },
);

say replace('then text Foo bar is processed into F** b*r', @rules);


sub replace {
  my ($text, @rules) = @_;

  my @lines = split /\n/, $text;

  for my $rule (@rules) {
    my ($pattern, $replace) = @{$rule}{qw/ pattern replace /};
    $replace =~ s/"/\\"/g;
    s/$pattern/'"'.$replace.'"'/gee for @lines;
  }

  join "\n", @lines;
}

output

th*n t*xt <F>** b*r *s pr*c*ss*d *nt* <F>** b*r
Borodin
  • 126,100
  • 9
  • 70
  • 144
1

I posted my proposed solution as a comment because I wasn't sure if there was a better solution. Since @Borodin came up with essentially the same solution (on his own), I figured I'd post some code I wrote dealing with this and my thoughts on it.

Here's the code I had:

use strict;
use warnings;

my @rules = ({pattern => '[aeiou]', replace => '*'},
             {pattern => 't', replace => 'T'},
             {pattern => '([fF])', replace => '<$1>'});

my $text = "Foo bar\nLine two";
print $text . "\n\n";
my @lines = split("\n", $text);

foreach ( @rules ) {
    my $pattern = $_->{"pattern"};
    my $replace = '"' . $_->{"replace"} . '"';
    print "Replacing $pattern with $replace\n";
    @lines = map {
        $_ =~ s/$pattern/$replace/geer;
    } @lines;
}

print "\nOutput: \n". join("\n", @lines);

Output:

Foo bar
Line two

Replacing [aeiou] with "*"
Replacing t with "T"
Replacing ([fF]) with "<$1>"

Output: 
<F>** b*r
L*n* Tw*

Basically, this becomes a problem when you are replacing something with a " in it, such as {pattern => 'L', replace => '"l'}. Then we get some errors:

Bareword found where operator expected at (eval 7) line 1, near """l"
    (Missing operator before l?)
String found where operator expected at (eval 7) line 1, at end of line
    (Missing semicolon on previous line?)
Use of uninitialized value in substitution iterator at test11.pl line 15.

This part is solved when you have a \" instead: {pattern => 'L', replace => '\"l'}

And our output becomes:

<F>** b*r
"l*n* tw*

However this breaks again if you have three slashes {pattern => 'L', replace => '\\\"l'}.

It just seemed liked a fragile solution, because you can't blindly replace " with \" in your rules. I was hoping there was a better solution which is why I posted as a comment.

hmatt1
  • 4,939
  • 3
  • 30
  • 51