0

I'm not sure this is possible or not. I want a php script when executed , it will go to a page (on a different domain) and get the html contents of it and inside the html there's links , and that script is able to get each link's href.

html code:

<div id="somediv">
  <a href="http://yahoo.com" class="url">Yahoo</a>
  <a href="http://google.com" class="url">Google</a>
  <a href="http://facebook.com" class="url">Facebook</a>
</div>

The output code(which php will echo out) will be
http://yahoo.com
http://google.com
http://facebook.com

I have heard of cURL in php can do something like this but not exactly like this , i'm a bit confused , i hope some can guide me on this.

Thanks.

sm21guy
  • 626
  • 3
  • 13
  • 33
  • Do you have some code that you can show us? – middus Dec 18 '11 at 13:36
  • no sorry currently i dont even know which php function can do this or there is no way to do it. – sm21guy Dec 18 '11 at 13:37
  • You need to combine [cURL](http://php.net/manual/en/book.curl.php) and [DOM](http://php.net/manual/en/book.dom.php) for things like this. – DaveRandom Dec 18 '11 at 13:38
  • possible duplicate of [How to implement a web scraper in PHP?](http://stackoverflow.com/questions/26947/how-to-implement-a-web-scraper-in-php) – Shadow The GPT Wizard Dec 18 '11 at 13:38
  • You can use jQuery and Ajax to load the page you want. You don't need Php for this, as it should run on the client – Odys Dec 18 '11 at 13:38
  • possible duplicate of [Convert a (nested)HTML unordered list of links to PHP array of links](http://stackoverflow.com/questions/2617487/convert-a-nestedhtml-unordered-list-of-links-to-php-array-of-links), There are lots of similar questions to yours, this one has a small code example how you can do that (not the best one, but it should work). – hakre Dec 18 '11 at 13:47

2 Answers2

3

have a look at something like http://simplehtmldom.sourceforge.net/

ghstcode
  • 2,902
  • 1
  • 20
  • 30
2

Using DOM and XPath:

<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("http://www.example.com/"); // or you could load from a string using loadHTML();
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//div[@id='somediv']//a");
foreach($elements as $elem){
    echo $elem->getAttribute('href');
}

BTW: you should read up on DOM and XPath.

middus
  • 9,103
  • 1
  • 31
  • 33