4

I want to parse the link: http://dizli.com/dizli/db.html using php.

But when i wrote the code,

$url = "http://dizli.com/dizli/db.html";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(2)->getElementsByTagName('tr');
$rows = $tables->item(0)->getElementsByTagName('td');

foreach($rows as $row)
{
    $movie = $row->getElementsByTagName('b');
    echo $movie;}

I got bunch of errors:

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and td in http://dizli.com/dizli/db.html, line: 54 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 81 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 106 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 115 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 126 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 128 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: no name in http://dizli.com/dizli/db.html, line: 1575 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag blink invalid in http://dizli.com/dizli/db.html, line: 2190 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and b in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: td and font in http://dizli.com/dizli/db.html, line: 2200 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: body and center in http://dizli.com/dizli/db.html, line: 2225 in C:\development\app_server\C7\Lib\Tools\News.php on line 93

Catchable fatal error: Object of class DOMNodeList could not be converted to string in C:\development\app_server\C7\Lib\Tools\News.php on line 102

Can someone help me parse this link, so that I can save the Movie's names and Director's name.

Thanks in advance. Zeeshan

Justin
  • 84,773
  • 49
  • 224
  • 367
Zeeshan Rang
  • 19,375
  • 28
  • 72
  • 100
  • Somewhat related - http://stackoverflow.com/questions/1148928/disable-warnings-when-loading-non-well-formed-html-by-domdocument-php – Phil May 03 '11 at 23:23

3 Answers3

5

To hide the errors and still work with that code, just ad @ before $dom, like:

$html = @$dom->loadHTMLFile($url);
Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
Tom
  • 59
  • 1
  • 2
1

The page is written in very old HTML code (you can tell by the FONT tags, capitalization, etc.) and so <br> tags and probably paragraphs and other things as well, are not closed. I recommend using regular expressions to find them in this case.

Ry-
  • 218,210
  • 55
  • 464
  • 476
1

Your main problem is the last line:

echo $movie;

$movie is an instance of DOMNodeList so you can´t just echo it, you need to get it´s elements like for example $movie->item(0)

You can also just do a var_dump of $movie and see what that gets you.

The warnings you can possibly ignore, that depends on the output you get.

jeroen
  • 91,079
  • 21
  • 114
  • 132