1

I use regex for HTML parsing but I need your help to parse the following table:

            <table class="resultstable" width="100%" align="center">
                <tr>
                    <th width="10">#</th>
                    <th width="10"></th>
                    <th width="100">External Volume</th>
                </tr>                   
                <tr class='odd'>
                        <td align="center">1</td>
                        <td align="left">
                            <a href="#" title="http://xyz.com">http://xyz.com</a>
                            &nbsp;
                        </td>
                        <td align="right">210,779,783<br />(939,265&nbsp;/&nbsp;499,584)</td>
                    </tr>

                     <tr class='even'>
                        <td align="center">2</td>
                        <td align="left">
                            <a href="#" title="http://abc.com">http://abc.com</a>
                            &nbsp;
                        </td>
                        <td align="right">57,450,834<br />(288,915&nbsp;/&nbsp;62,935)</td>
                    </tr>
            </table>

I want to get all domains with their volume(in array or var) for example

http://xyz.com - 210,779,783

Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.

Rich
  • 5,603
  • 9
  • 39
  • 61
seoppc
  • 2,766
  • 7
  • 44
  • 76
  • 2
    You should nearly always use HTML DOM. This case is no different. – Madara's Ghost Mar 30 '12 at 18:23
  • 2
    See [this question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). You should **never** parse HTML using a regex. – HellaMad Mar 30 '12 at 18:25
  • @Truth can you please help me with HTML DOM, as i have just used HTML dom in simple parsing not for big table. thanks. – seoppc Mar 30 '12 at 18:26

1 Answers1

1

here's an XPath example that happens to parse the HTML from the question.

<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);

$trs = $xpath->query("//table[@class='resultstable'][1]/tr");
foreach ($trs as $tr) {
  $tdList = $xpath->query("td[2]/a", $tr);
  if ($tdList->length == 0) continue;
  $name = $tdList->item(0)->nodeValue;
  $tdList = $xpath->query("td[3]", $tr);
  $vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
  echo "name: {$name}, vol: {$vol}\n";
}
?>
dldnh
  • 8,923
  • 3
  • 40
  • 52