4

I am trying to select either a class or an id using PHP Simple HTML DOM Parser with absolutely no luck.

My example is very simple and seems to comply to the examples given in the manual
(simplehtmldom.sourceforge AT net/manual.htm) but it just wont work,
it's driving me up the wall.
Other example scripts given with simple dom work fine.

See the example: link text This is the easiest example i have found ... How to parse it?

Should i do it with Perl - The example HTML page is invalid HTML.
I do not know if the Simple HTML DOM Parser is able to handle badly malformed HTML
(probably not).

Well: if i cannot get it to run i can try out some Perl parsers eg HTML::TreeBuilder::XPath

ajreal
  • 46,720
  • 11
  • 89
  • 119
zero
  • 1,003
  • 3
  • 20
  • 42
  • hi dear community. thx for voting - i ve earned a studends badge! Well i will try out all i can. This is my second trial with Simple-html-dom-parser... If anybody can have a look at tis example - i woul d be happy – zero Dec 05 '10 at 14:34
  • If anybody has a working example of Simple-html-dom-parser...i would be happy.- the examples on the developersite are not very helpful – zero Dec 05 '10 at 14:36
  • Malformed HTML is very hard to parse. What are you trying to achieve by parsing it, what is the end result you are aiming at? (In case there is another way.) – Orbling Dec 05 '10 at 14:38
  • Hello Orbling. Many thanks for the quick answer. Well i want to get the data out of the table. I guess that it would be better to try this with another parser: perhaps i can find some perl-parser. What do u think?! Is there a better way - that we can handle the malformed html? – zero Dec 05 '10 at 15:27
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Dec 05 '10 at 17:32

1 Answers1

4

Use Tidy to clean up the malformed HTML before parsing it using the PHP DOM parser.

http://www.php.net/manual/en/tidy.examples.basic.php

Christian Joudrey
  • 3,441
  • 25
  • 25
  • I've had to use this exact method in the past in order to parse arbitrary pages for information which way not be valid html. – rdrkt Dec 06 '12 at 01:24