1

I'm trying to create a simple tool to parse html files.

Specifically, I need it to get all the name attributes out of all the div tags.

My HTML string varies and I don't have any control over it, so if I try and use xpath I tend to get errors as the HTML is not 100% written correctly.

Any ideas?

Thanks,

Or Weinberger
  • 7,332
  • 23
  • 71
  • 116
  • possible duplicate of [Grabbing the href attribute of an A element](http://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element) – Gordon May 11 '11 at 10:27
  • 3
    If the HTML is malformed, use [`DOMDocument::loadHTML()`](http://de.php.net/manual/en/domdocument.loadhtml.php). That will make DOM use the HTML parser module which can handle most broken HTML fine and would allow you to use XPath then. – Gordon May 11 '11 at 10:29
  • 1
    *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon May 11 '11 at 10:30

1 Answers1

2

There is also a great class called PHP Simple HTML DOM Parser on http://simplehtmldom.sourceforge.net/

Works fine with invalid HTML, but needs a lot of memory for parsing long html-files.

shadowhorst
  • 1,480
  • 13
  • 21