5

I'm trying to parse HTML from loadHTML but I'm having trouble, I managed to loop through all <tr>s in the document but I don't know how to loop through the <td> s on each row.

This is what I did so far:

$DOM->loadHTML($url);
$rows= $DOM->getElementsByTagName('tr');

for ($i = 0; $i < $rows->length; $i++) { // loop through rows
    // loop through columns
    ...
}

How can I get loop through the columns in each row?

Salman A
  • 262,204
  • 82
  • 430
  • 521
lisovaccaro
  • 32,502
  • 98
  • 258
  • 410
  • Easier-to-use [wrappers](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php) around the DOM methods exist, specifically for looping over element collections. – mario Jan 09 '13 at 21:10

3 Answers3

8

DOMElement also supports getElementsByTagName:

$DOM = new DOMDocument();
$DOM->loadHTMLFile("file path or url");
$rows = $DOM->getElementsByTagName("tr");
for ($i = 0; $i < $rows->length; $i++) {
    $cols = $rows->item($i)->getElementsbyTagName("td");
    for ($j = 0; $j < $cols->length; $j++) {
        echo $cols->item($j)->nodeValue, "\t";
        // you can also use DOMElement::textContent
        // echo $cols->item($j)->textContent, "\t";
    }
    echo "\n";
}
Salman A
  • 262,204
  • 82
  • 430
  • 521
  • I haven't been able to echo the col content inside the loop. I tried `echo $cols->item($i)->nodeValue;`, could you edit it? I'll take this one if it works as it's easier to implement in my case – lisovaccaro Jan 09 '13 at 21:32
  • I have made minor changes to the code. See if it works. And see if the column is not empty. – Salman A Jan 09 '13 at 21:41
2

Use DOMXPath to query out the child column nodes with a relative xpath query, like this:

$xpath = new DOMXPath( $DOM);
$rows= $xpath->query('//table/tr');

foreach( $rows as $row) {
    $cols = $xpath->query( 'td', $row); // Get the <td> elements that are children of this <tr>
    foreach( $cols as $col) {
        echo $col->textContent;
    }
}

Edit: To start at specific rows and stop, keep your own index on the row by changing how you're iterating over the DOMNodeList:

$xpath = new DOMXPath( $DOM);
$rows= $xpath->query('//table/tr');

for( $i = 3, $max = $rows->length - 2; $i < $max, $i++) {
    $row = $rows->item( $i);
    $cols = $xpath->query( 'td', $row);
    foreach( $cols as $col) {
        echo $col->textContent;
    }
}
nickb
  • 59,313
  • 13
  • 108
  • 143
  • this works, I just have a problem, how can I start from row 3 and end in totalrows - 2? I was using `($i = 3; $i < $rows->length -2; $i++)` before for the loop – lisovaccaro Jan 09 '13 at 21:27
  • @Liso - You can keep those counts yourself, I'll update my answer – nickb Jan 09 '13 at 21:32
  • @Liso - All `$xpath->query()` is giving you back is a DOMNodeList, so you can iterate over it just the same as you were before. The benefit is that now, instead of just using `getElementsByTagName()`, you have much more control over what actually gets put in that DOMNodeList. Try my updated solution, it should work for your requirements. – nickb Jan 09 '13 at 21:41
0

Would re-looping work?

$DOM->loadHTML($url);
$rows= $DOM->getElementsByTagName('tr');
$tds= $DOM->getElementsByTagName('td');

for ($i = 0; $i < $rows->length; $i++) {
// loop through columns
     for ($i = 0; $i < $tds->length; $i++) {
     // loop through rows

     }

}

EDIT You will also have to check the parent node to make sure that the rows parent is the tr you are currently in. Something like

if ($rows == tds->parent_node){
// do whatever
}

May not be syntactically 100% correct, but the concept is sound.

Zak
  • 6,976
  • 2
  • 26
  • 48