Selecting a specific text node with phpQuery

Question

OK, this is driving me nuts. I'm trying to screen-scrape the following bit of seemingly trivial HTML with phpQuery:

<td><nobr>10-05-2009</nobr><br>06:10<br>17:35 -1</td>

The date is easy since it's wrapped in the nobr tag, so eg. $element[':first-child']->text() does the trick. But how do I get my grubby mitts on the second bit of text?

CSS works on elements only, so nth-child(2),(3) return the surrounding <br> tags, not the text.

If I could XPath it, the second node in .//text() would be gold. But apparently in phpQuery-land, the context for $element->xpath->query('.//text()') is the document root, so I get every single piece of text in the entire document!

Ideas? All the solutions in How do I select text nodes with jQuery? appear to involve Javascript DOM operations, which are considerably less evil than PHP's terrible DOM API. Maybe just dumping the entire element to string and exploding it on <br> is the way to go...

score 3 · Accepted Answer · answered Nov 09 '10 at 12:28

3

From http://php.net/manual/en/domxpath.query.php

DOMNodeList DOMXPath::query ( string $expression [, DOMNode $contextnode [, boolean $registerNodeNS = true ]] )

So, this should work with td as context node:

$element->xpath->query('text()[1]',$element)

answered Nov 09 '10 at 12:28

Thanks, that basically works, but you need to pass in a DOMNode (instead of a phpQueryObject) and then convert back the result, and for reasons I don't really understand the XPath selector `.//text()[1]` does not work but `xpath->query('.//text()')->item(1)` does. So the final code ends up like this: `$src_time = pq($element)->xpath->query('.//text()', $element)->item(1); $src_time = pq($src_time)->text();` Pretty monstrous, but gets the job done, so thanks! – lambshaanxy Nov 10 '10 at 10:45

score 0 · Answer 2 · answered Nov 09 '10 at 12:51

0

Have you tried iterating through the text methods of $element[':first-child']->siblings()? That should give you access to all of their text, no?

answered Nov 09 '10 at 12:51

cwallenpoole

79,954
26
128
166

Afraid not -- siblings returns just the two
tags. – lambshaanxy Nov 10 '10 at 10:38
OH! Ok. Sorry, I misread the br's (I thought one was an open tag and the other a closing tag for some reason) – cwallenpoole Nov 10 '10 at 14:10

score 0 · Answer 3 · answered Nov 10 '10 at 11:03

Using Alejandro's answer as the base, I came up with this little function:

function nth_text($element, $n) {
  $xpath = new DOMXPath($element->ownerDocument);
  return $xpath->query('.//text()', $element)->item($n)->textContent;
}

Incidentally, that's pure PHP DOM, no phpQuery needed (or allowed, the argument has to be a DOMNode or DOMElement). And now the original problem is easy:

$src_date = nth_text($element, 0);
$src_time = nth_text($element, 1);

Yay!

Selecting a specific text node with phpQuery

3 Answers3