I want to parse html to a dom tree, and find all the text NOT inside the <a>
tags, so, I googled it, and found "PHP Simple HTML DOM Parser". It seems it can help me to parse the HTML DOM to a DOM Tree. I would like to find the text NOT inside <a>
tags, but I only can find the element which is inside <a>
tag. *ps: it don't support the CSS3 not selector yet. Thank you.
Any one experience on this? Thank you.
Asked
Active
Viewed 1,118 times
2

Charles Sprayberry
- 7,741
- 3
- 41
- 50

Tattat
- 15,548
- 33
- 87
- 138
-
Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). Also see [Best methods or parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Jul 19 '11 at 07:41
2 Answers
1
I hope I'm not misunderstanding the question, but can't you use the built-in DOM functions for PHP to find the text inside the <a>
tags?
$doc = new DOMDocument();
$doc->loadHTMLFile("http://blahblah.com/blah.html");
$elem_list = $doc->getElementsByTagName("a");
foreach($elem_list as $elem)
echo $elem->textContent;
In that case I would remove all <a>
tags and their contents (for example with regular expressions) and then load the resulting HTML into your DOM parser of choice.
Update: Even better, immediately parse the HTML and use the built-in functions to remove the <a>
tags, or loop through all tags and just skip the <a>
tags. Regex with HTML should be avoided.

newenglander
- 2,019
- 24
- 55
-
Oh, ok. The question text was a little misleading, tried to correct it (my edits need to be peer reviewed), hope that makes it more clear. – newenglander Jul 19 '11 at 09:26
0
I have used this class many times. Its an excellent solution to parse html/dom in php.
$html = new simple_html_dom();
// Load your html as string
$html->load('........ HTML ..........');
$a = $html->find('a');
$text='';
for($i=0;$i<count($a);$i++)
$text.=$a[$i]->innertext;
variable $text containing all the text in a tags. Hope it will help you.

Imran Naqvi
- 2,202
- 5
- 26
- 53