2

i have the following code to for xpath query...

<div class="buying">


<h1 class="parseasinTitle ">

<span id="btAsinTitle">Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking) <span style="text-transform: capitalize; font-size: 16px;">[Kindle Edition]</span></span>


</h1>
</div>

i just want to extract

Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking)

so i am using textContent with the following xpath query

$xpath_books->query('//span[@id="btAsinTitle"]')

but the result is

Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking) [Kindle Edition]

i think, i have to exclude <span style="text-transform: capitalize; font-size: 16px;"> , to get my purpose, how can i do it ?

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
Zaffar Saffee
  • 6,167
  • 5
  • 39
  • 77

2 Answers2

4

Use this XPath:

//span[@id="btAsinTitle"]/text()
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
  • well, what i know is, text() function is used to extract the text from the node, but i am confusing here, Why it worked in my case? [kindle addition] is also a text? why so? – Zaffar Saffee Feb 04 '12 at 19:20
  • what i guess about the reason is, [kindle edition] is enclosed in another '' , so it was droped, and text arround the xpath selected span was extracted, Am I correct? – Zaffar Saffee Feb 04 '12 at 19:21
  • @NewBee, This query using `text()` function selects `span[@id="btAsinTitle"]` children text nodes. This span contains only 1 child node - `Top Ten Tips for Growing Your Own Tomatoes (The Basic Art of Italian Cooking)`. Text node `[Kindle Edition]` is child text node of other `span`. – Kirill Polishchuk Feb 04 '12 at 19:23
4

Your XPath does return the node with the id only, but because DOM is a tree of linked DOMNodes, the returned node will contain the child node. And when you access the returned span with nodeValue or textContent, PHP will return the combined DOMText nodes of all the children, including the child span holding "Kindle Edition".

      SPAN
     /    \
   TEXT   SPAN
            \
            TEXT

More on that at DOMDocument in php

If you want to fetch only the first text part, you have to fetch the nodeValue of the first childNode:

echo $result->item(0)->childNodes->item(0)->nodeValue;

An alternative to fetch that string with XPath directly would be

echo $xpath->evaluate('string(//span[@id="btAsinTitle"]/text())');

See http://php.net/manual/en/domxpath.evaluate.php

If you want to return the whole DOMText node instead, use

//span[@id="btAsinTitle"]/text()
Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559