I have group of html files where i have to extract content between <hr>
and </hr>
tags.I have done everything except this extraction.What i have done is
1.Loaded all html files and store it in @html_files
.
2.Then I am storing each file's content in @useful_files
array.
3.Then I am looping the @useful_files
array and checking each line where <hr>
is found.If found I need next lines of content in @elements
array.
Is it possible.Am I in the right?
foreach(@html_files){
$single_file = $_;
$elemets = ();
open $fh, '<', $dir.'/'.$single_file or die "Could not open '$single_file' $!\n";
@useful_files = ();
@useful_files = <$fh>;
foreach(@useful_files){
$line = $_;
chomp($line);
if($line =~ /<hr>/){
@elements = $line;
}
}
create(@elements,$single_file)
}
Thanks !!!
My input html file will be like this
<HR SIZE="3" style="COLOR:#999999" WIDTH="100%" ALIGN="CENTER">
<P STYLE="margin-top:0px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. </FONT></P>
<P STYLE="font-size:12px;margin-top:0px;margin-bottom:0px"> </P>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%" BORDER="0" STYLE="BORDER-COLLAPSE:COLLAPSE">
<TR>
<TD WIDTH="45%"></TD>
<TD VALIGN="bottom" WIDTH="1%"></TD>
<TD WIDTH="4%"></TD>
<TD VALIGN="bottom"></TD>
<TD WIDTH="4%"></TD>
<TD VALIGN="bottom" WIDTH="1%"></TD>
<TD WIDTH="44%"></TD></TR>
<TR>
<TD VALIGN="top"></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"><FONT STYLE="font-family:Times New Roman" SIZE="2">Title:</FONT></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"><FONT STYLE="font-family:Times New Roman" SIZE="2">John</FONT></TD></TR>
</TABLE>
<p Style='page-break-before:always'>
<HR SIZE="3" style="COLOR:#999999" WIDTH="100%" ALIGN="CENTER">
The html code which i have copied here is just the sample.I need the exact content between the <hr>
in the @elements
array.
` and `` from the existing html. – user3431651 Jan 29 '15 at 12:06
(.*)#
newcontents#g' file.html`. it replaces contents between each hr-tag with *newcontents*. or do you "need" a perl variant for that? – Marc Bredt Jan 29 '15 at 12:12
` is defined to be empty](http://www.w3.org/TR/2014/REC-html5-20141028/grouping-content.html#the-hr-element). Since you have something strange going on there, if you throw that into different browsers, you'll probably get different results. – Patrick J. S. Jan 29 '15 at 14:40
` and `` will not get you any results. – tjwrona1992 Jan 30 '15 at 20:37