I'm using DOMXPath to get the content of specific nodes. For my problem, I want to get all the text of the matching divs except that of nested divs.
$html =
'<div itemscope="itemscope" itemtype="http://schema.org/Event">
<span itemprop="name"> Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)</span>
<meta itemprop="startDate" content="2016-04-21">
Thu, 04/21/16
8:00 p.m
<div itemprop="offers" itemscope="itemscope" itemtype="http://schema.org/AggregateOffer">
Priced from: <span itemprop="lowPrice">$35</span>
<span itemprop="offerCount">1938</span> tickets left
</div>
<meta itemprop="endDate" content="2020-3-2"> end date of year
<div itemprop="attendee" itemscope="itemscope" itemtype="http://schema.org/Person">
<span itemprop="name">Jane Doe</span>
<meta itemprop="birthDate" content="1975-05-06">
<div itemprop="sibling" itemscope="itemscope" itemtype="http://schema.org/Person">
<span itemprop="name">Fatima Zohra</span>
<meta itemprop="birthDate" content="1991-6-5">Jan 6
</div>
</div>
</div>';
I first tried the following but this did not return the nested divs:
$tags = $xpath->query("//div[@itemscope='itemscope'][not(self::div)]/text()");
My current attempt is the following, but does not work:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[not(ancestor::div)]');
foreach ($tags as $node) {
echo $node->nodeValue; // body
}