I am having a HTML file which is very large. I need to extract particular <div>...</div>
section in a variable.
##some contents
<div class="title-bar" onclick="folder(c_1)"><table class="layout"><tr><td class="h1" width="400">Summary Of Test Report <br>(E:\Packages\SamplePackage)</td><td><a style="cursor:hand;text-decoration:none;" onclick="showTOC()"><div style="float:left"><div style="float:right"><div style="float:left"></div></div></a></td></tr></table></div><div expandable="1" id="c_1"><a name="title"></a><table class="content" cellpadding="2"><tr><td><table id="details"><tr><td class="h4">Package Name:</td><td class="info">E:\Packages\SamplePackage</td></tr><tr><td class="h4">OS:</td><td class="info">Microsoft Windows Server 2008 R2 Standard </td></tr><tr><td class="h4">Testing:</td><td class="info">Regression Test</td></tr><tr><td class="h4">Machine Name:</td><td class="info">XYZTST036 (Number Of Cores: 4
; CPU Clock Speed: 3500
Mhz; Memory: 32,494 MB)</td></tr><tr><td class="h4">Duration:</td><td class="info">00:28:31</td></tr><tr><td class="h4">Total No. Of Testcases:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Executed:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Passed:</td><td class="info">42</td></tr><tr><td class="h4">No. Of Testcases Failed:</td><td class="info">0</td></tr><tr><td class="h4">No. Of Testcases NA(Not Appplicable):</td><td class="info">12</td></tr><tr><td class="h4">Skipped Testcases:</td><td class="info"><a href="SkippedTestcaseDetails.html">None</a></td></tr><tr><td class="h4">Date:</td><td class="info">8-02-2016
</td></tr><tr><td class="h4">Start Time(17:58:02)/ Completion Time (18:26:33)</td><td class="info"></td></tr></table></td></tr></table></div></div>
##some contents
I used regex, like
my $html_filepath = "G:\\Report.html";
open(HTML, "<$html_filepath") or die "Can't open $html_filepath $!\n";
$body .= "\nTest Report Summary:\n\n";
my $content;
my $summarySection;
{
local $/ = undef; # slurp mode
$content = <HTML>;
}
$content =~ s/\r\n//g;
#print $content;
if ($content ne "")
{
if ($content =~ m/<div class="title-bar" (.*)/)
#if ( $last_line =~ m/^<tr> <td>(\d+)<\/td>/ )
{
$summarySection = "$1";
}
}
print "\n $summarySection";
Output I got is:
<div class="title-bar" onclick="folder(c_1)"><table class="layout"><tr><td class="h1" width="400">Summary Of Test Report <br>(E:\Packages\SamplePackage)</td><td><a style="cursor:hand;text-decoration:none;" onclick="showTOC()"><div style="float:left"><div style="float:right"><div style="float:left"></div></div></a></td></tr></table></div><div expandable="1" id="c_1"><a name="title"></a><table class="content" cellpadding="2"><tr><td><table id="details"><tr><td class="h4">Package Name:</td><td class="info">E:\Packages\SamplePackage</td></tr><tr><td class="h4">OS:</td><td class="info">Microsoft Windows Server 2008 R2 Standard </td></tr><tr><td class="h4">Testing:</td><td class="info">Regression Test</td></tr><tr><td class="h4">Machine Name:</td><td class="info">XYZTST036 (Number Of Cores: 4
; CPU Clock Speed: 3500
Mhz; Memory: 32,494 MB)</td></tr><tr><td class="h4">Duration:</td><td class="info">00:28:31</td></tr><tr><td class="h4">Total No. Of Testcases:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Executed:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Passed:</td><td class="info">42</td></tr><tr><td class="h4">No. Of Testcases Failed:</td><td class="info">0</td></tr><tr><td class="h4">No. Of Testcases NA(Not Appplicable):</td><td class="info">12</td></tr><tr><td class="h4">Skipped Testcases:</td><td class="info"><a href="SkippedTestcaseDetails.html">None</a></td></tr><tr><td class="h4">Date:</td><td class="info">8-02-2016
But I need the output like,
<div class="title-bar" onclick="folder(c_1)"><table class="layout"><tr><td class="h1" width="400">Summary Of Test Report <br>(E:\Packages\SamplePackage)</td><td><a style="cursor:hand;text-decoration:none;" onclick="showTOC()"><div style="float:left"><div style="float:right"><div style="float:left"></div></div></a></td></tr></table></div><div expandable="1" id="c_1"><a name="title"></a><table class="content" cellpadding="2"><tr><td><table id="details"><tr><td class="h4">Package Name:</td><td class="info">E:\Packages\SamplePackage</td></tr><tr><td class="h4">OS:</td><td class="info">Microsoft Windows Server 2008 R2 Standard </td></tr><tr><td class="h4">Testing:</td><td class="info">Regression Test</td></tr><tr><td class="h4">Machine Name:</td><td class="info">XYZTST036 (Number Of Cores: 4
; CPU Clock Speed: 3500
Mhz; Memory: 32,494 MB)</td></tr><tr><td class="h4">Duration:</td><td class="info">00:28:31</td></tr><tr><td class="h4">Total No. Of Testcases:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Executed:</td><td class="info">54</td></tr><tr><td class="h4">No. Of Testcases Passed:</td><td class="info">42</td></tr><tr><td class="h4">No. Of Testcases Failed:</td><td class="info">0</td></tr><tr><td class="h4">No. Of Testcases NA(Not Appplicable):</td><td class="info">12</td></tr><tr><td class="h4">Skipped Testcases:</td><td class="info"><a href="SkippedTestcaseDetails.html">None</a></td></tr><tr><td class="h4">Date:</td><td class="info">8-02-2016
</td></tr><tr><td class="h4">Start Time(17:58:02)/ Completion Time (18:26:33)</td><td class="info"></td></tr></table></td></tr></table></div></div>
I have tried the following regex,
if ($content =~ m/<div class="title-bar" (.*)<\/table><\/div><\/div>/)
But this did not work.
Please give me some ideas to get the content including the line break, newline and white space.