0

I'm stuck with this. I try to use php dom to parse some html code. How can I get to know how many children current element has witch I iterate through in for loop?

<?php
$str='
<table id="tableId">
<tr>
    <td>row1 cell1</td>
    <td>row1 cell2</td>
</tr>
<tr>
    <td>row2 cell1</td>
    <td>row2 cell2</td>
</tr>
</table>
';

$DOM = new DOMDocument;
$DOM->loadHTML($str);   // loading page contents
$table = $DOM->getElementById('tableId');   // getting the table that I need
$DOM->loadHTML($table);     

$tr = $DOM->getElementsByTagName('tr');     // getting rows

echo $tr->item(0)->nodeValue;   // outputs row1 cell1 row1 cell2 - exactly as I expect with both rows
echo "<br>";
echo $tr->item(1)->nodeValue;   // outputs row2 cell1 row2 cell2

// now I need to iterate through each row to build an array with cells that it has
for ($i = 0; $i < $tr->length; $i++)
{
echo $tr->item($i)->length;     // outputs no value. But how can I get it?
echo $i."<br />";
}
?>
blackdad
  • 1,343
  • 1
  • 10
  • 13
  • possible duplicate of http://stackoverflow.com/questions/2477790/php-dom-not-retrieving-element. Same issues. – gcochard Jun 14 '12 at 20:12
  • do you need the DOM to do that? otherwise i could show you [another](http://simplehtmldom.sourceforge.net/) way.. –  Jun 14 '12 at 20:14
  • DOMDocument::loadHTML takes an html string, on your second call to it(which seems pointless) you pass a DOMElement. – Musa Jun 14 '12 at 20:14
  • Im not completely familiar with PHP DOM, but since it is PHP and since it appears your working with Objects/Arrays couldn't you apply something like `echo count($tr)`. Again not one hundred percent on that but just a thought. – chris Jun 14 '12 at 20:22
  • @Greg: other issue. That one is about fragments & getElementById not working. That part works here. Although there is probably a duplicate out there, for instance [this one](http://stackoverflow.com/questions/191923/how-do-i-iterate-through-dom-elements-in-php). – Wrikken Jun 14 '12 at 20:45

1 Answers1

2

This will give you all childnodes:

$tr->item($i)->childNodes->length;

... but: it will contain DOMText nodes with whitespace etc (so the count is 4). If you don't necessarily need the length, just want to iterate over all the nodes, you can do this:

foreach($tr->item($i)->childNodes as $node){
    if($node instanceof DOMElement){
        var_dump($node->ownerDocument->saveXML($node));
    }
}

If you need only a length of elements, you can do this:

$x = new DOMXPath($DOM);
var_dump($x->evaluate('count(*)',$tr->item($i)));

And you can do this:

foreach($x->query('*',$tr->item($i)) as $child){
    var_dump($child->nodeValue);
}

foreach-ing through the ->childNodes has my preference for simple 'array-building'. Keep in mind you van just foreach through DOMNodeList's as if they were arrays, saves a lot of hassle.

Building a simple array from a table:

$DOM = new DOMDocument;
$DOM->loadHTML($str);   // loading page contents
$table = $DOM->getElementById('tableId'); 
$result = array();
foreach($table->childNodes as $row){
   if(strtolower($row->tagName) != 'tr') continue;
   $rowdata = array();
   foreach($row->childNodes as $cell){
       if(strtolower($cell->tagName) != 'td') continue;
       $rowdata[] = $cell->textContent;
   }
   $result[] = $rowdata;
}
var_dump($result);
Wrikken
  • 69,272
  • 8
  • 97
  • 136
  • In your second code, you dump/saveXML the node if instanceof DOMElement. What would the others have? Will I be missing something by omitting them? – sergio Jun 24 '15 at 05:48
  • @sergio: instances of `DOMNode` which aren't `DOMElements` includes amount others raw text not in an element (so not a subnode) for instance random whitespace (all those enters and tabs usually found in HTML to format it visually). For a complete list of what is a `Node`, but not an `Element`, look at the [DOM specification](http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-1590626202) – Wrikken Jun 26 '15 at 17:08