How to use newline in substitution s/// in RE using Perl?

Question

Input file contains more than 1 newlines, empty tags as follows:

<html>
<body>
<title>XXX</title>
<p>text...</p>
<collaboration seq="">
<ce:text></ce:text>
</collaboration>
...


<p>text</p>
<collaboration seq="">
<ce:text>AAA</ce:text>
</collaboration>
<p>text</p>
</body>
</html>

Output file needed with only one newline characters, empty tags must be removed

<html>
<body>
<title>XXX</title>
<p>text...</p>
...
<p>text</p>  
<p>text</p>
<collaboration seq="">
<ce:text>AAA</ce:text>
</collaboration>
</body>
</html>

Coding which in have tried:

print "Enter the file name without extension: ";
chomp($filename=<STDIN>);
open(RED,"$filename.txt") || die "Could not open TXT file";
open(WRIT,">$filename.html");
while(<RED>)
{
  #process in file
  s/<collaboration seq="">\n<ce:text><\/ce:text>\n<\/collaboration>//g;
  s/\n\n//g;
  print WRIT $_;
}
close(RED);
close(WRIT);

Above coding doesn't clears anything which is needed... How to solve this?

Your script works line by line, so you can not match multiple line. And if you have a string with the whole thext you need the /m flag for multiline match. — Jens, Dec 17 '14 at 08:49

score 0 · Answer 1 · edited May 23 '17 at 12:01

0

First you should actually slurp the file. so let's say that you use zigdon's method:

my $file;
{
    print "Enter the file name without extension: ";
    my $filename = <STDIN>

    chomp($filename);
    open F, $filename or die "Can't read $filename: $!";
    local $/;  # enable slurp mode, locally.
    $file = <F>;
    close F;
}

Now $file contains the contents of your file so you can work with it.

#process in file
$file ~= s/<collaboration seq="">\R<ce:text><\/ce:text>\R<\/collaboration>//g;
$file ~= s/\R{2,}/\n/g; #I'm guessing this is probably what you intended
print WRIT $file;

edited May 23 '17 at 12:01

Community

1
1

answered Dec 17 '14 at 13:36

Jonathan Mee

37,899
23
129
288

1

don't parse HTML/XML with regular expressions. ever. – Patrick J. S. Dec 17 '14 at 15:39
@PatrickJ.S. I'd generally agree with that comment, because Perl does have a lot of good xml extensions. However, that's a separate question and serves only to unnecessarily obscure what should be a simple answer here. – Jonathan Mee Dec 17 '14 at 15:51

score 0 · Answer 2 · answered Dec 20 '14 at 04:19

You can use XML::Simple for this:

# use XML simple to process the XML
my $xs = XML::Simple->new(
      # remove extra whitespace
      NormaliseSpace => 2,
      # keep root element
      KeepRoot       => 1,
      # force elements to arrays
      ForceArray     => 1,
      # ignore empty elements
      SuppressEmpty  => 1
);
# read in the XML
my $ref = $xs->XMLin($xml);


# print out the XML minus the empty tags
print $xs->XMLout($ref);

How to use newline in substitution s/// in RE using Perl?

2 Answers2