Use the DOM extension, for example. Its DOMXPath
class is particularly useful for such kind of tasks.
You can easily set the listed conditions with an XPath expression like this:
//table[@class="space"]//tr[count(td) = 2]/td
where
- //table[@class="space"]
selects all table
elements from the document having class
attribute value equal to "space"
string;
- //tr[count(td) = 2]
selects all tr
elements having exactly two td
child elements;
- /td
represents the td
elements.
Sample implementation:
$html = <<<'HTML'
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
<tr>
<td class="earthia">3</td>
</tr>
</tbody>
</table>
HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$cells = $xpath->query('//table[@class="space"]//tr[count(td) = 2]/td');
$i = 0;
foreach ($cells as $td) {
if (++$i % 2) {
$number = $td->nodeValue;
} else {
$planet = trim($td->textContent);
printf("%d: %s\n", $number, $planet);
}
}
Output
1: Mars
2: Earth
The code above is supposed to be considered as a sample rather than an instruction for practical use, as it is not very scalable. The logic is bound to the fact that the XPath expression selects exactly two cells for each row. In practice, you may want to select the rows, iterate them, and put the extra conditions into the loop, e.g.:
$rows = $xpath->query('//table[@class="space"]//tr');
foreach ($rows as $tr) {
$cells = $xpath->query('.//td', $tr);
if ($cells->length < 2) {
continue;
}
$number = $cells[0]->nodeValue;
$planet = trim($cells[1]->textContent);
printf("%d: %s\n", $number, $planet);
}
DOMXPath::query()
is called with an XPath expression relative to the current row ($tr
), then checks if the returned DOMNodeList
contains at least two cells. The rest of the code is trivial.
You can also use SimpleXML
extension, which also supports XPath. But the extension is much less flexible as compared to the DOM
extension.
For huge documents, use extensions based on SAX-based parsers such as XMLReader
.