How can I extract HTML table data using Perl?

Question

I need to retrieve some data from a web page. After analysing the HTML code of the page, I found the data I need is embeded in a table with a unique table id. I don't know whether it is an HTML rule or not, anyway it's very good for parsing I think.

The data in the table is arranged as below (various attributes and tags have been omitted in order to give you a clear "data structure")

<table .... id = "tablename" .... >
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
         #several "trs" here
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
</table>

So my question is how to use Perl's HTML parser utility to meet my needs in this case.

Thanks in advance.

score 12 · Accepted Answer · answered Dec 21 '09 at 07:33

12

HTML::TableExtract sounds exactly like what you are looking for.

answered Dec 21 '09 at 07:33

Leon Timmermans

30,029
2
61
110

score 2 · Answer 2 · edited Dec 23 '09 at 02:09

2

Use HTML::Table.

edited Dec 23 '09 at 02:09

brian d foy

129,424
31
207
592

answered Dec 21 '09 at 11:30

Pradeep

3,093
17
21

score -1 · Answer 3 · edited Dec 23 '09 at 02:15

-1

Look at Ken MacFarlane's Parsing HTML with HTML::Parser in The Perl Journal. I'm not sure if that's the parser you're referring to, but it looks like it can do what you want, or at least point you in the right direction.

edited Dec 23 '09 at 02:15

brian d foy

129,424
31
207
592

answered Dec 21 '09 at 05:55

Chris Thompson

35,167
12
80
109

You shouldn't have to reach down into HTML::Parser for this. There are many tools built on top of it that should be able to handle the job. – brian d foy Dec 23 '09 at 02:15

How can I extract HTML table data using Perl?

3 Answers3