17

Possible Duplicate:
How to parse and process HTML with PHP?

I'm looking into HTML DOM parsers for PHP. I've found PHP Simple HTML DOM Parser. Are there any others I should be looking at?

Community
  • 1
  • 1
StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441

3 Answers3

19

Yes. Simple html doc is fine, but an order of magnitude slower than the built in dom parser.

$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom); 

foreach($x->query("//a") as $node) 
{
    $data['dom']['href'][] = $node->getAttribute("href");
} 

Use that.

Byron Whitlock
  • 52,691
  • 28
  • 123
  • 168
  • is there a way to make `query` return a node instead of a nodelist? For example, a page has only one h1 tag. I want to get it's nodeValue, but don't think I should have to iterate through a nodelist. – StackOverflowNewbie Dec 02 '10 at 01:16
  • 2
    You should be able to use `$node[0]` to get the first node in the list. Or just iterate and break. I just iterate and break. If the query returns nothing I don't get any errors that way. – Byron Whitlock Dec 02 '10 at 18:07
5

You can look at the builtin DOM

http://php.net/dom

KingCrunch
  • 128,817
  • 21
  • 151
  • 173
1

Recently I also found ganon, but in general PHP Simple HTML DOM Parser is the best!

Slav
  • 576
  • 8
  • 28
  • 1
    PHP Simple HTML DOM Parser chokes, if you try to crawl multiple pages, e.g. level 1: get 300 links (e.g. from a listing) level2: go to each link and retrieve page with details and fetch elements. All you get is a collection of reset errors (depending on server type) - plus it is very slow – Jeffz Sep 07 '12 at 18:06
  • ganon only load 2 element of my wants and when try to run simple html dom parser my computer hanged !!! – Yuseferi Dec 28 '14 at 19:42
  • I found ganon to be much slower than PHP's built in DOM as well as Simple HTML DOM Parser. Moreover, Simple HTML DOM seems to suffer from heavy memory leakage and you have to manually clean or reuse the allocated objects. – jahackbeth Feb 19 '15 at 06:12