Well, I have a HTML File with the following structure:
<h3>Heading 1</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
<h3>Heading 2</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
I want to get JUST the first table with all its content. So I'll load the HTML File
<?php
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('http://www.example.com'));
libxml_clear_errors();
?>
All tables have the same classes and also have NO specific ID's. That's why the only way I could think of was to grab the h3-tag with the value "Heading 1". I already found this one, which works well for me. (Thinking of the fact that other tables and captions could be added leaves the solution as unfavorable)
How could I grab the h3 tag WITH the value "Heading 1"? + How could I select the following table?
EDIT#1: I don't have access to the HTML File, so I can't edit it.
EDIT#2: My Solution (thanks to Martin Henriksen) for now is:
<?php
$doc = new DOMDocument(1.0);
libxml_use_internal_errors(true);
$doc->loadHTML(file_get_contents('http://example.com'));
libxml_clear_errors();
foreach($doc->getElementsByTagName('h3') as $element){
if($element->nodeValue == 'exampleString')
$table = $element->nextSibling->nextSibling;
$innerHTML= '';
$children = $table->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
echo $innerHTML;
file_put_contents("test.xml", $innerHTML);
}
?>