Using PHP DOM to achieve the same as this Regular Expression?

Question

I know understand that it's not advised to using regular expressions to parse HTML. I'm using the following regex to get the data inside of a element that comes directly after a element.

$string = "</th><td>Capture This</td>";
$pattern = "/<\/th>.*<td>(.*)<\/td>$/";

preg_match ($pattern, $string, $matches);

echo("<pre>" . $matches[0] . "</pre>");

Can somebody please explain to me how I'd go about capturing the contents of a <td> element that comes directly after the closing tag of a <th> element using PHP's DOMDocument or similar functionality?

Have a look at [`nextSibling`](http://www.php.net/manual/en/class.domnode.php#domnode.props.nextsibling). — Felix Kling, Nov 22 '11 at 17:45
Here is a little code and explaination.http://pastebin.com/rGNBbVAK — hhwhy, Nov 22 '11 at 18:13

score 0 · Answer 1 · answered Nov 22 '11 at 17:51

0

It can easily be fetched with Simple HTML DOM for PHP:

http://simplehtmldom.sourceforge.net/

Post some more of the source and I will give you the element path

answered Nov 22 '11 at 17:51

abcde123483

3,885
4
41
41

Ah! That library looks great, but yeah if you can help me that would be awesome. Here is a pastebin: http://pastebin.com/rGNBbVAK – hhwhy Nov 22 '11 at 17:53
As I said you did not post enough details of HTML source to give a reliable XPath – abcde123483 Nov 22 '11 at 17:55
Sorry about that, here is more code. The text inside of the preceding element is ALWAYS the same and completely unique. So basically, I need to get the contents of a element that follows a element who has text inside that says "Unique" for example. http://pastebin.com/rGNBbVAK – hhwhy Nov 22 '11 at 17:59
2

Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). – Gordon Nov 22 '11 at 18:12
@bow-viper1 your example still presents of with precious little information compared with the full source of the page – abcde123483 Nov 22 '11 at 18:26
@ulvund I've just pastebin'd the full form. There is no identifiable information outside of that, unfortunately :( http://pastebin.com/Vx8kGU7V – hhwhy Nov 22 '11 at 22:46

Using PHP DOM to achieve the same as this Regular Expression?

1 Answers1