0

Well, I have a HTML File with the following structure:

<h3>Heading 1</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>
<h3>Heading 2</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>

I want to get JUST the first table with all its content. So I'll load the HTML File

<?php 
  $dom = new DOMDocument();
  libxml_use_internal_errors(true);
  $dom->loadHTML(file_get_contents('http://www.example.com'));
  libxml_clear_errors();
?>

All tables have the same classes and also have NO specific ID's. That's why the only way I could think of was to grab the h3-tag with the value "Heading 1". I already found this one, which works well for me. (Thinking of the fact that other tables and captions could be added leaves the solution as unfavorable)
How could I grab the h3 tag WITH the value "Heading 1"? + How could I select the following table?

EDIT#1: I don't have access to the HTML File, so I can't edit it.
EDIT#2: My Solution (thanks to Martin Henriksen) for now is:

<?php
    $doc = new DOMDocument(1.0);
    libxml_use_internal_errors(true);
    $doc->loadHTML(file_get_contents('http://example.com'));
    libxml_clear_errors();
    foreach($doc->getElementsByTagName('h3') as $element){
      if($element->nodeValue == 'exampleString')
        $table = $element->nextSibling->nextSibling;
        $innerHTML= '';
        $children = $table->childNodes;
        foreach ($children as $child) {
          $innerHTML .= $child->ownerDocument->saveXML( $child );
        }
        echo $innerHTML;
        file_put_contents("test.xml", $innerHTML);
    }
  ?>
baumi_
  • 7
  • 5

2 Answers2

1

You can Find any tag in HTML using simple_html_dom.php class you can download this file from this link https://sourceforge.net/projects/simplehtmldom/?source=typ_redirect

Than

<?php
include_once('simple_html_dom.php');

$htm  = "**YOUR HTML CODE**";
$html = str_get_html($htm);
$h3_tag = $html->find("<h3>",0)->innertext;
echo "HTML code in h3 tag"; 
print_r($h3_tag);
?>
0

You can fetch out all the DomElements which the tag h3, and check what value it holds by accessing the nodeValue. When you found the h3 tag, you can select the next element in the DomTree by nextSibling.

foreach($dom->getElementsByTagName('h3') as $element)
{
    if($element->nodeValue == 'Heading 1')
        $table = $element->nextSibling;
}
mrhn
  • 17,961
  • 4
  • 27
  • 46
  • 2
    Remember nextSibling can be trickty to work with http://stackoverflow.com/questions/20851106/nextsibling-doesnt-work-when-working-with-php-domdocument-solved – mrhn May 20 '17 at 12:42