2

OK, this is driving me nuts. I'm trying to screen-scrape the following bit of seemingly trivial HTML with phpQuery:

<td><nobr>10-05-2009</nobr><br>06:10<br>17:35 -1</td>

The date is easy since it's wrapped in the nobr tag, so eg. $element[':first-child']->text() does the trick. But how do I get my grubby mitts on the second bit of text?

CSS works on elements only, so nth-child(2),(3) return the surrounding <br> tags, not the text.

If I could XPath it, the second node in .//text() would be gold. But apparently in phpQuery-land, the context for $element->xpath->query('.//text()') is the document root, so I get every single piece of text in the entire document!

Ideas? All the solutions in How do I select text nodes with jQuery? appear to involve Javascript DOM operations, which are considerably less evil than PHP's terrible DOM API. Maybe just dumping the entire element to string and exploding it on <br> is the way to go...

Community
  • 1
  • 1
lambshaanxy
  • 22,552
  • 10
  • 68
  • 92

3 Answers3

3

From http://php.net/manual/en/domxpath.query.php

DOMNodeList DOMXPath::query ( string $expression [, DOMNode $contextnode [, boolean $registerNodeNS = true ]] )

So, this should work with td as context node:

$element->xpath->query('text()[1]',$element)
  • Thanks, that basically works, but you need to pass in a DOMNode (instead of a phpQueryObject) and then convert back the result, and for reasons I don't really understand the XPath selector `.//text()[1]` does not work but `xpath->query('.//text()')->item(1)` does. So the final code ends up like this: `$src_time = pq($element)->xpath->query('.//text()', $element)->item(1); $src_time = pq($src_time)->text();` Pretty monstrous, but gets the job done, so thanks! – lambshaanxy Nov 10 '10 at 10:45
0

Have you tried iterating through the text methods of $element[':first-child']->siblings()? That should give you access to all of their text, no?

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
0

Using Alejandro's answer as the base, I came up with this little function:

function nth_text($element, $n) {
  $xpath = new DOMXPath($element->ownerDocument);
  return $xpath->query('.//text()', $element)->item($n)->textContent;
}

Incidentally, that's pure PHP DOM, no phpQuery needed (or allowed, the argument has to be a DOMNode or DOMElement). And now the original problem is easy:

$src_date = nth_text($element, 0);
$src_time = nth_text($element, 1);

Yay!

lambshaanxy
  • 22,552
  • 10
  • 68
  • 92