0
    <?php

$file = 'http://www.google.com';
$doc = new DOMDocument();
@ $doc->loadHTML(file_get_contents($file));

echo $doc->getElementsByTagName('span')->item(2)->nodeValue;

if (0 != $element->length) 
{
    $content = trim($element->item(2)->nodeValue);
    if (empty($content)) 
    {
        $content = trim($element->item(2)->textContent);
    }
    echo $content . "\n";
}

?>

im trying to get the inner content of a span tag from google.com's home site. this code should output the first span tag, but it is not outputting any results?

Shawn
  • 933
  • 4
  • 18
  • 29
  • Still no luck eh... did you double check that allow_fopen_url is enabled in php.ini – JP_ Nov 25 '12 at 18:27
  • 2
    You are suppressing error messages with the `@`. You should start with removing that. – jeroen Nov 25 '12 at 18:27
  • Yeah, this calls for basic debugging. What does `loadHTML()` return? What does `file_get_contents()` return? What does `$element` contain? What is `$element->length`s value? – Pekka Nov 25 '12 at 18:28
  • I think using @ here is correct. He is suppressing any invalid html markup. – JP_ Nov 25 '12 at 18:28
  • 1
    @good4m yeah, but at the same time he is suppressing any errors `file_get_contents()` might be giving him. – Pekka Nov 25 '12 at 18:29

4 Answers4

4

The is not an error ... the first span in http://www.google.com is empty and am not sure what else you expect

 <span class=gbtcb></span> <----------------  item(0)
 <span class=gbtb2></span> <----------------  item(1)
 <span class=gbts>Search</span> <-----------  item(2)

Try

$element = $doc->getElementsByTagName('span')->item(2);
var_dump($element->nodeValue);

Output

Search
Baba
  • 94,024
  • 28
  • 166
  • 217
  • yes, the output works, and i appreciate your answer yet my output on my server is "string(0) "" " so it's on my end, right? – Shawn Nov 25 '12 at 18:47
  • very strange, others have reported the same problem . . – Shawn Nov 25 '12 at 18:55
  • `var_dump(file_get_content("http://google.com"))` add the result to pastbin let me take a look at it ... – Baba Nov 25 '12 at 18:57
  • returns an empty string for me. must be a configuration problem. works fine on the pastebin – JP_ Nov 25 '12 at 18:59
  • add `error_reporting(E_ALL);` and `ini_set('display_errors','On');` on top of the page and try again – Baba Nov 25 '12 at 19:01
  • yes it is a server configuration issue. i tried it locally and it works fine; however, on my webhost is doesnt. im writing up an alternative cURL snippet for him now – JP_ Nov 25 '12 at 19:07
  • i did several sites and it still returns empty – JP_ Nov 25 '12 at 19:11
  • Add your `phpinfo` to pastbin let me see – Baba Nov 25 '12 at 19:12
  • And can you just restart `apache` and `php` – Baba Nov 25 '12 at 19:13
0

First, bear in mind that the HTML is not necessarily valid XML.

That aside, check that you're actually getting some contents to parse; you need to have allow_url_fopen enabled in order to use file_get_contents() with URLs.

In general, avoid using the error suppression operator (@) because it will almost certainly come back to bite you some time (and this time might well be that time); there is a discussion on this elsewhere on SO.

So, as a first step, switch to something like the following let me know if you're getting any contents at all.

// stop using @ to suppress errors
$contents = file_get_contents($file);
// check that you're getting something to parse
echo $contents;
Community
  • 1
  • 1
El Yobo
  • 14,823
  • 5
  • 60
  • 78
0

Try this and tell us what the output is

<?
echo ini_get('allow_url_fopen');
?>
JP_
  • 1,636
  • 15
  • 26
0

Try using cURL to get the data and then load it into a DOMDocument:

<?php
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument();
@$dom->loadHTML($data); //The @ is necessary to suppress invalid markup

echo $dom->getElementsByTagName('span')->item(2)->nodeValue;

if (0 != $element->length) 
{
    $content = trim($element->item(2)->nodeValue);
    if (empty($content)) 
    {
        $content = trim($element->item(2)->textContent);
    }
    echo $content . "\n";
}

?>
JP_
  • 1,636
  • 15
  • 26