I have an html page that has particular text that I want to parse into a databse using a Perl Script.
I want to be able to strip off all the stuff I don't want, an exmple of the html is-
<div class="postbody">
<h3><a href "foo">Re: John Smith <span class="posthilit">England</span></a></h3>
<div class="content">Is C# better than Visula Basic?</div>
</div>
Therefore I would want to import into the database
- Name: John Smith.
- Lives in: England.
- Commented: Is C# better than Visula Basic?
I have started to create a Perl script but it needs to be changed to work for what I want;
use DBI;
open (FILE, "list") || die "couldn't open the file!";
open (F1, ">list.csv") || die "couldn't open the file!";
print F1 "Name\|Lives In\|Commented\n";
while ($line=<FILE>)
{
chop($line);
$text = "";
$add = 0;
open (DATA, $line) || die "couldn't open the data!";
while ($data=<DATA>)
{
if ($data =~ /ds\-div/)
{
$data =~ s/\,//g;
$data =~ s/\"//g;
$data =~ s/\'//g;
$text = $text . $data;
}
}
@p = split(/\\/, $line);
print F1 $p[2];
print F1 ",";
print F1 $p[1];
print F1 ",";
print F1 $p[1];
print F1 ",";
print F1 "\n";
$a = $a + 1;
Any input would be greatly appreciated.