Parse HTML without xpath

Question

I'm trying to create a simple tool to parse html files.

Specifically, I need it to get all the name attributes out of all the div tags.

My HTML string varies and I don't have any control over it, so if I try and use xpath I tend to get errors as the HTML is not 100% written correctly.

Any ideas?

Thanks,

possible duplicate of [Grabbing the href attribute of an A element](http://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element) — Gordon, May 11 '11 at 10:27
If the HTML is malformed, use [`DOMDocument::loadHTML()`](http://de.php.net/manual/en/domdocument.loadhtml.php). That will make DOM use the HTML parser module which can handle most broken HTML fine and would allow you to use XPath then. — Gordon, May 11 '11 at 10:29
*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, May 11 '11 at 10:30

score 2 · Accepted Answer · answered May 11 '11 at 15:27

2

There is also a great class called PHP Simple HTML DOM Parser on http://simplehtmldom.sourceforge.net/

Works fine with invalid HTML, but needs a lot of memory for parsing long html-files.

answered May 11 '11 at 15:27

shadowhorst

1 Answers1