0

I want to get all the links on a page so I could get attributes like title of the a href etc...

<?php
function exception_handler($exception) {
  echo "Uncaught exception: " , $exception->getMessage(), "\n";
}

set_exception_handler('exception_handler');

function dom_create()
{
  echo("domcreate");
  $file = file_get_html('http://www.facebook.com/plugins/fan.php?connections=100&id=40796308305');
  echo($file);
  $doc = new DOMDocument();
  $doc->loadHTMLFile($file);

  $xpath = new DOMXpath($doc);

  $elements = $xpath->query("//*[@id]");


  if (!is_null($elements)) {
    foreach($elements as $e){
      $documentLinks = $e->getElementsByTagName('a');
    }
    else
      echo "NULL";
  }
}

dom_create();
?>

I don't get any output even if I just set echo statements. Anyone an idea?

user1007522
  • 7,858
  • 17
  • 69
  • 113

2 Answers2

0

Your braces are all wrong:

if (!is_null($elements)) {
    foreach($elements as $e){
        $documentLinks = $e->getElementsByTagName('a');
        // perhaps add echo here if you want to output the links somehow
    }       
} else {
  echo "NULL";
}
mareckmareck
  • 1,560
  • 13
  • 18
  • Thanks, didn't noticed that. Weird that I didn't get errors then. But still, the foreach is never runned. Do I get my html on the wrong matter? – user1007522 May 19 '14 at 09:36
0

I solved it by doing it via get_contents and give it a context.

<?php
function exception_handler($exception) {
  echo "Uncaught exception: " , $exception->getMessage(), "\n";
}

set_exception_handler('exception_handler');

function dom_create()
{
  $context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0')));
  $file = file_get_contents('http://www.facebook.com/plugins/fan.php?connections=100&id=6568341043637',false, $context);
  $dom = new DOMDocument;
  $dom->loadHTML($file);
  foreach ($dom->getElementsByTagName('a') as $node) {
    echo $dom->saveHtml($node), PHP_EOL;
  }
}

dom_create();
?>
user1007522
  • 7,858
  • 17
  • 69
  • 113
  • Ah yeah, I might be wrong, but I think it's because file_get_html returns you already a node object, so you don't need to pass it to $dom->loadHTML. See here: http://stackoverflow.com/questions/18667441/simple-html-dom-file-get-html-not-working-is-there-any-workaround – mareckmareck May 19 '14 at 10:49