1

In my localhost document root:

crawl.html

<html>
<body>
<p>
<form action="welcome.php" method="get">
Site to crawl: <input type="text" name="crawlThis">
<input type="submit">
</form>
</p>

</body>
</html> 

welcome.php

 <html>
 <body>

 <?php 
 include ("crawler.php");

 echo $crawl = new Crawler($_GET["crawlThis"]);

 $images = $crawl->get("images");

 $links = $crawl->get("links"); 

 echo $links;
 echo $images;

 ?>
 <br>

</body>
</html> 

and crawler.php

<?php

class Crawler {

protected $markup = '';

public function __construct($uri) {

$this->markup = $this->getMarkup($uri);

}

public function getMarkup($uri) {

return file_get_contents($uri);

}

public function get($type) {

$method = "_get_{$type}";

if (method_exists($this, $method)){

return call_user_method($method, $this);

}

}

protected function _get_images() {

if (!empty($this->markup)){

preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images);

return !empty($images[1]) ? $images[1] : FALSE;

}

}

protected function _get_links() {

if (!empty($this->markup)){

preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links);

return !empty($links[1]) ? $links[1] : FALSE;

}

}

}


/*$crawl = new Crawler($);

$images = $crawl->get('images');

$links = $crawl->get('links');*/

?>

Result page is just empty. Can't figure out if I just can't echo $images, or if my logic is wrong. I'm expecting a list of images, and then a list of links.

Also, do I have to include crawler.php or will php search its container directory for a class of the same name?

Sorry, coming to PHP from Java is a bit of a mindscrew.

  • Watch out, parsing HTML with regexes [leads to invasions by the elder gods](http://stackoverflow.com/q/1732348/168868). Please see http://htmlparsing.com/php.html – Charles Dec 24 '12 at 19:52
  • 1
    Is it a mistake, or just the way Stack Overflow does things or just me, but why are apostrophes “ instead of " in the script? Could that be related to why the script doesn't work? And why are ‘ not '? Try correcting that and see what it does... –  Dec 24 '12 at 19:54
  • Unless there's any === type/value comparisons, I think it would be okay even to interchange ' and ". But I don't even have a debugger for PHP yet, so I'm not one to talk. –  Dec 24 '12 at 19:58
  • Rewrote and retested the program with standardized ' and ". No difference in result, which is nothing at all. –  Dec 24 '12 at 22:55

2 Answers2

1

You're using some type of accented quote characters like and

These aren't valid quote characters in php. You need to use regular quotes like " and '

also, you should configure php to show you errors and notices before you think about writing any more code.

goat
  • 31,486
  • 7
  • 73
  • 96
  • Edited all the weird quotes. I can hear my computer working to fetch the data, but I still have the same problem of an empty localhost/welcome?crawlThis=www.google.com –  Dec 24 '12 at 22:49
  • use `var_dump($links);` you should get *something* – goat Dec 24 '12 at 23:36
  • And what does it mean if I still get nothing? My server is working. echo "whatever" prints whatever. But something in my logic seems to be stopping the whole page. Knowing myself, its probably a minute mistake that I'll overlook 100 times. –  Dec 25 '12 at 02:12
  • "you should configure php to show you errors and notices before you think about writing any more code". imagine if the java compiler didnt give you errors but just went silent - that's your current situation. – goat Dec 25 '12 at 02:16
0

I'm all for writing it yourself, but there are plenty of documented examples that will do this. Here's a good example that you can follow or use:

crawler example

jsteinmann
  • 4,502
  • 3
  • 17
  • 21
  • Any idea why the crawler your provided also shows nothing when I put it in my localhost root folder? –  Dec 24 '12 at 23:00
  • Nevermind, I didn't download their library. The thing is though, I need to do this without any external libraries. I'm being tested on my ability to make production code while learning a new language- and external libraries defeat the purpose of the task. –  Dec 24 '12 at 23:03