1

I'm trying to parse the following multiline string (starting with ) and comment it out.

    -->
<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />
<!-- A "Connector" using the shared thread pool-->

So I tried using the following:

perl -i.bak -pe 'BEGIN{undef $/;}
        s/
            \s+ #space at beginning of line
            (<Connector\s+ #connector
             port=\"8080\"\s+  #port
             protocol=\"HTTP\/1\.1\" #protocol
             \n  #newline
            \s+connectionTimeout=\"20000\"\n # space, connection timeout, then newline
            \s+redirectPort=\"8443\" #redirect port
            \/> # end of connector entry in file
            ) # end capture for $1
        /
            <!--$1-->
        /msx
    ' server.xml

diff server.xml server.xml.bak

But the diff output shows nothing. Any idea what I'm missing here?

lordadmira
  • 1,807
  • 5
  • 14
Burvil
  • 65
  • 1
  • 5

2 Answers2

2

I think I figured it out.

perl -i.bak -pe 'BEGIN{undef $/;}
        s/
            --> #preceding line ends a comment, with newline at end
            \s+ #space at beginning of line
            (<Connector\s+ #connector
             port=\"8080\"\s+  #port
             protocol=\"HTTP\/1\.1\" #protocol
            \s+connectionTimeout=\"20000\" # space, connection timeout, then newline
            \s+redirectPort=\"8443\" #redirect port
            \s+   #space
            \/> # end of connector entry in file
            ) # end capture for $1
        /
            -->\n<!-- $1 -->
        /msx
    ' server.xml

diff server.xml server.xml.bak
~
ikegami
  • 367,544
  • 15
  • 269
  • 518
Burvil
  • 65
  • 1
  • 5
0

Don't use that BEGIN block. The normal way to slurp in a text file is to use the -0 switch. That sets the input record seperator to the null character. If there is any chance there are nulls in the file use -0777.

If you know precisely what the search text is, you don't need anything as complicated as you wrote. Perl has that use case covered. The \Q \E operator automatically quotes any possibly troublesome characters but still allows variable substitution to happen.
$foo = 'f.oo bar$'; print qr/\Q$foo\E/;
(?^:f\.oo\ bar\$)

$pattern = qr{\Q<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />\E};
$text =~ s/($pattern)/<!-- $1 -->/;

I see that you want to do it as a command line so it would be something like this.

perl -i.bak -lp0e '$pattern = qr{\Q<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />\E};
s/($pattern)/<!-- $1 -->/; ' FILE

The code you put will only execute once because there is only "one line" to the input.

If there is wiggle room in the amount of whitespace, you can do a dynamic substitution on the pattern itself.

$pattern = qq{\Q<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />\E};
# Translate runs of escaped whitspace into a single \s+
$pattern =~ s/(?:\\\s)+/\s+/g;
$text =~ s/($pattern)/<!-- $1 -->/;

HTH

lordadmira
  • 1,807
  • 5
  • 14
  • 1
    Don't use `-0` either. That doesn't actually do what you want. Use `-0777` (This is the same as `$/=undef;`.) ...I note that you later mentioned how to do it right, so why did you bother mentioning the wrong way? – ikegami Dec 04 '20 at 05:06
  • Why did you bother making this comment when I explicitly differentiated between -0 and -0777? :D -0 is one of those 99% rules. Just remember the 1%. – lordadmira Dec 04 '20 at 06:10
  • Re "*Why did you bother making this comment when I explicitly differentiated between -0 and -0777?*", As I said, I didn't notice the mention of -0777 when I wrote the comment – ikegami Dec 04 '20 at 06:14
  • 1
    Re "*-0 is one of those 99% rules. Just remember the 1%.*", What? Why the hell would I want to use the wrong thing because it only fails 1% of the time? That's complete nonsense. And for what? To save 3 chars? How does that warrant all the confusion? Use the thing that works 100% of the time! – ikegami Dec 04 '20 at 06:27
  • 1
    Re "*The \Q \E operator automatically quotes any possibly troublesome characters.*", Not quite. It doesn't quote `$`, `@` and some instances of `\ ` or prevent their effects. – ikegami Dec 04 '20 at 06:27
  • Re "it's wrong". It's not wrong. It's TIMTOWTDI, programmer's discretion. Perl is *postmodern*. Not dogmatic. – lordadmira Dec 04 '20 at 06:27
  • 1
    You yourself said it will randomly not work. That's the very definition of wrong. TIMTOWTDI suggests tolerances to other styles. It's not talking about accepting buggy solutions. – ikegami Dec 04 '20 at 06:33
  • I did *not* say it would randomly not work. Any character that you know to not be in the file is suitable for -0. Nulls don't happen at random. BTW, \Q does escape $ and @. Variable substitution happens before quotemeta is applied so it might seem to you that they are not quoted. – lordadmira Dec 04 '20 at 06:41
  • Re "*Variable substitution happens before quotemeta is applied so it might seem to you that they are not quoted*", That's what I said. `\Q..\E` doesn't escape `$` and `@`, but let's them have their special meaning: Interpolation. – ikegami Dec 04 '20 at 06:43
  • *Not quite. It doesn't quote $, @ and some instances of \ or prevent their effects. – ikegami* – lordadmira Dec 04 '20 at 06:45
  • 1
    Which is true. Like you just said. `$` and `@` are still treated specially (at least some of the time) within `\Q..\E`. They still perform their interpolation contrary to what you originally said. – ikegami Dec 04 '20 at 06:45
  • 1
    Re "*Any character that you know to not be in the file*", You didn't check the file, though. And it would a lot more than three character to do that! – ikegami Dec 04 '20 at 06:47
  • I really don't know why you're litigating this. *`quotemeta` does not treat $ and @ specially, ever.* As `qq` processes the string it performs variable substitution before it interprets the `\Q` quotemeta operator. `deparse { "foo\Q .|$bar \$\Ebaz" } 9 --> (('foo' . quotemeta(((' .|' . $bar) . ' $'))) . 'baz');` – lordadmira Dec 04 '20 at 08:04
  • 1
    I know why it doesn't escape certain `$`, `@` and `\ `. You didn't have to explain it to me. – ikegami Dec 04 '20 at 08:09
  • `quotemeta` always escapes every $, @, and \ that it sees. Period. – lordadmira Dec 04 '20 at 08:24
  • 1
    That's a straw man; I never said otherwise. I said "it doesn't escape certain `$`, `@` and `\ `", while you incorrectly said " `\Q \E` operator automatically quotes any possibly troublesome characters". As you explained, it doesn't even see some of them, so it can't escape them. The only time `\Q..\E` works reliably is on strings interpolated within. – ikegami Dec 04 '20 at 08:41