I am trying to extract the content of a text file between two tags and store it to another file.
I manage to convert the input file to a multiple line string variable then use regexp successfully to get what I want in the variable.
But I fail writing my variable to a file, I assume this is because of the type of string with multiple \n inside.
I would appreciate any help. (This my first Perl Script…)
For the test, I use a index.html file but can be any text file.
Edit : solved, see correction in comments
Here below my documented code :
# Extract string between two tags
use strict;
use warnings;
my $inputfile = "";
my $outputfile = "";
# Parse Macro Arguments arguments
if(@ARGV < 2)
{
print "Usage: perl Macro_name.pl <inputfile.HTML> <outfile.HTML>\n";
exit(1);
}
$inputfile = $ARGV[0];
$outputfile = $ARGV[1];
my $body="";
# Convert input file to multiple line string #
$body = File_to_Var_Multi_Line($inputfile);
# First tag & Second tag match
if ( $body =~ /(.*)<body(.*?)>(.*)<\/body>/s )
{ # error :
my $body = $3; # $body is local here
# correction :
#Print to check if extract ok # declare another variable outside if
print $body, "\n";
}
# Write to file my match multiple line string #
open(my $fh_body, '>:encoding(UTF-8)', $outputfile)
or die "Could not open file '$outputfile' $!";
print $fh_body "$body\n";
close $fh_body;
# sub #
sub File_to_Var_Multi_Line
{
if(@_ < 1)
{
print "Usage: line=File_to_Var_Multi_Line<file>\n";
exit(1);
}
my $inputfile_2 = "";
$inputfile_2 = $_[0];
open(my $fl_in, '<:encoding(UTF-8)', $inputfile_2)
or die "Could not open file '$inputfile_2' $!";
my $line = "";
my $row_2 = "";
while (my $row_2 = <$fl_in>)
{
$line .= $row_2;
}
return $line
}
And the input test file :
<html>
<body>
<a href="page1.html">page 1</a><br>
<a href="page2.html">page 2</a><br>
<a href="page3.html">page 3</a><br>
<a href="page4.html">page 4</a><br>
<a href="page5.html">page 5</a><br>
</body>
</html>