PHP XPath substring-after only returning first result

Question

I am doing some HTML scraping and have hit a wall with this one query. I am trying to return a set of values from the following HTML page structure:

<div id="product-grid">
    <ul>
        <li><div class="price">Cash Price: $20.00</div></li>
        <li><div class="price">Cash Price: $30.00</div></li>
        <li><div class="price">Cash Price: $40.00</div></li>
    </ul>
</div>

I am trying to get the "$20.00" prices returned in a list. If I use the following XPath:

id('product-grid')//p[@class="price"]

I get a result list of all the "Cash Price: $40.00". If I try the following query:

substring-after(id('product-grid')//p[@class="price"] , "Price: ")

I get the correct output, but only get the first result. Anyone know how I can get all results?

I am running PHP5.3.3 with libxml 2.7.8 for the XPath. I am calling the xpath as follows:

$xpath = new DOMXPath( $html ); 
$resultset= $xpath->query($query);

I have been googling like mad trying to find out why this is happening! Please help!

Tom · Answer 1 · 2011-09-18T11:49:01.680

1

You have to use substring after getting your list.

 id('product-grid')//div[@class="price"][substring-after(., 'Price: ')]

This should work.

EDIT : This seems to be working. However I can't test the return value as I don't know how to get the substring'd value. What do you use ?

edited Sep 18 '11 at 11:49

answered Sep 18 '11 at 10:15

Tom

1,647
11
24

1

Using a function on the axis is an XPath 2.0 feature. Probably not available in standard PHP environment. You should be able to apply it to a predicate filter: `id('product-grid')//p[@class="price"][substring-after(., 'Price: ')]. Also, the sample XML shows `div` elements with `@class`, but the example XPath (and your answer) expect `p` to have `@class`. – Mads Hansen Sep 18 '11 at 11:45
@Mads Hansen, post edited to comply with 1.0. I used OP's code so I used p. Changed it to div indeed. – Tom Sep 18 '11 at 11:51

score 1 · Accepted Answer · edited May 23 '17 at 12:31

Sorry, but I don't think that this is possible in one step. As far as I know XPath 1.0 does not support function calls at the end of an XPath path. The answer here indicates the same.

Furthermore you must not use id('product-grid') as the first path part because the id is on the root element and does not need to be selected specially. If your sample XML is just a fragment of a larger XML document, the id() might be necessary though.

The following works as expected:

$xml = new DOMDocument();
$xml->loadXML('<div id="product-grid">
 <ul>
  <li><div class="price">Cash Price: $20.00</div></li>
  <li><div class="price">Cash Price: $30.00</div></li>
  <li><div class="price">Cash Price: $40.00</div></li>
</ul>
</div>');
$xpath = new DOMXPath($xml);
foreach ($xpath->query('//div[@class="price"]') as $n) {
    var_dump(substr($n->nodeValue, strpos($n->nodeValue, '$')));
}

score 1 · Answer 3 · answered Sep 18 '11 at 16:21

The wanted processing cannot be specified just as a single XPath 1.0 expression, because by definition any function that expects a single string argument but is given a node-set, takes the string value of the first only (in document order) node of this node-set.

Also, unlike XPath 2.0 in XPath 1.0 it isn't allowed to specify a function call as a location step.

Therefore, one solution is to issue this XPath expression:

substring-after((id('product-grid')//p[@class="price"])[$k], "Price: ")

N times, substituting $k in each expression with 1,2,..., N, where N is the result of evaluating another XPath expression:

count(id('product-grid')//p[@class="price"])

Using XPath 2.0 one can do this with this simple and single expression:

id('product-grid')//p[@class="price"]/substring-after(., "Price: ")

which when evaluated produces exactly the wanted sequence of strings.

PHP XPath substring-after only returning first result

3 Answers3