3

I need to retrieve some data from a web page. After analysing the HTML code of the page, I found the data I need is embeded in a table with a unique table id. I don't know whether it is an HTML rule or not, anyway it's very good for parsing I think.

The data in the table is arranged as below (various attributes and tags have been omitted in order to give you a clear "data structure")

<table .... id = "tablename" .... >
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
         #several "trs" here
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
</table>

So my question is how to use Perl's HTML parser utility to meet my needs in this case.

Thanks in advance.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Haiyuan Zhang
  • 40,802
  • 41
  • 107
  • 134

3 Answers3

12

HTML::TableExtract sounds exactly like what you are looking for.

Leon Timmermans
  • 30,029
  • 2
  • 61
  • 110
2

Use HTML::Table.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Pradeep
  • 3,093
  • 17
  • 21
-1

Look at Ken MacFarlane's Parsing HTML with HTML::Parser in The Perl Journal. I'm not sure if that's the parser you're referring to, but it looks like it can do what you want, or at least point you in the right direction.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Chris Thompson
  • 35,167
  • 12
  • 80
  • 109
  • You shouldn't have to reach down into HTML::Parser for this. There are many tools built on top of it that should be able to handle the job. – brian d foy Dec 23 '09 at 02:15