I have an HTML file containing a 2-column table which I want to parse in order to extract pairs of strings representing the columns. The page layout of the HTML (white space, new lines) is arbitrary, hence I can't parse the file line by line.
I recall that you can parse such a thing by slurping the whole file into a string and operating on the entire string, which I'm finding a bit more challenging. I'm trying things like the following:
#!/usr/bin/perl
open(FILE, "Glossary") || die "Couldn't open file\n";
@lines = <FILE>;
close(FILE);
$data = join(' ', @lines);
while ($data =~ /<tr>.*(<td>.*<\/td>).*(<td>.*<\/td>).*<\/tr>/g) {
print $1, ":", $2, "\n";
}
which gives a null
output. Here's a section of the input file:
<table class="wikitable">
<tr>
<td><b>Term</b>
</td>
<td><b>Meaning</b>
</td></tr>
<tr>
<td><span id="0-Day">0-Day</span>
</td>
<td>
<p>See <a href="#Zero_Day">Zero Day</a>.
</p>
</td>
Can someone help me out?