code not parsing through a simple google.com test

Question

    <?php

$file = 'http://www.google.com';
$doc = new DOMDocument();
@ $doc->loadHTML(file_get_contents($file));

echo $doc->getElementsByTagName('span')->item(2)->nodeValue;

if (0 != $element->length) 
{
    $content = trim($element->item(2)->nodeValue);
    if (empty($content)) 
    {
        $content = trim($element->item(2)->textContent);
    }
    echo $content . "\n";
}

?>

im trying to get the inner content of a span tag from google.com's home site. this code should output the first span tag, but it is not outputting any results?

Still no luck eh... did you double check that allow_fopen_url is enabled in php.ini — JP_, Nov 25 '12 at 18:27
You are suppressing error messages with the `@`. You should start with removing that. — jeroen, Nov 25 '12 at 18:27
Yeah, this calls for basic debugging. What does `loadHTML()` return? What does `file_get_contents()` return? What does `$element` contain? What is `$element->length`s value? — Pekka, Nov 25 '12 at 18:28
I think using @ here is correct. He is suppressing any invalid html markup. — JP_, Nov 25 '12 at 18:28
@good4m yeah, but at the same time he is suppressing any errors `file_get_contents()` might be giving him. — Pekka, Nov 25 '12 at 18:29

Baba · Accepted Answer · 2012-11-25T18:56:36.393

4

The is not an error ... the first span in http://www.google.com is empty and am not sure what else you expect

 <span class=gbtcb></span> <----------------  item(0)
 <span class=gbtb2></span> <----------------  item(1)
 <span class=gbts>Search</span> <-----------  item(2)

Try

$element = $doc->getElementsByTagName('span')->item(2);
var_dump($element->nodeValue);

Output

Search

edited Nov 25 '12 at 18:56

answered Nov 25 '12 at 18:28

Baba

94,024
28
166
217

yes, the output works, and i appreciate your answer yet my output on my server is "string(0) "" " so it's on my end, right? – Shawn Nov 25 '12 at 18:47
very strange, others have reported the same problem . . – Shawn Nov 25 '12 at 18:55
`var_dump(file_get_content("http://google.com"))` add the result to pastbin let me take a look at it ... – Baba Nov 25 '12 at 18:57
returns an empty string for me. must be a configuration problem. works fine on the pastebin – JP_ Nov 25 '12 at 18:59
add `error_reporting(E_ALL);` and `ini_set('display_errors','On');` on top of the page and try again – Baba Nov 25 '12 at 19:01
yes it is a server configuration issue. i tried it locally and it works fine; however, on my webhost is doesnt. im writing up an alternative cURL snippet for him now – JP_ Nov 25 '12 at 19:07
i did several sites and it still returns empty – JP_ Nov 25 '12 at 19:11
Add your `phpinfo` to pastbin let me see – Baba Nov 25 '12 at 19:12
And can you just restart `apache` and `php` – Baba Nov 25 '12 at 19:13

score 0 · Answer 2 · edited May 23 '17 at 12:27

First, bear in mind that the HTML is not necessarily valid XML.

That aside, check that you're actually getting some contents to parse; you need to have allow_url_fopen enabled in order to use file_get_contents() with URLs.

In general, avoid using the error suppression operator (@) because it will almost certainly come back to bite you some time (and this time might well be that time); there is a discussion on this elsewhere on SO.

So, as a first step, switch to something like the following let me know if you're getting any contents at all.

// stop using @ to suppress errors
$contents = file_get_contents($file);
// check that you're getting something to parse
echo $contents;

score 0 · Answer 3 · answered Nov 25 '12 at 18:40

0

Try this and tell us what the output is

<?
echo ini_get('allow_url_fopen');
?>

answered Nov 25 '12 at 18:40

JP_

1,636
15
26

Ok so we can rule out that being the problem. – JP_ Nov 25 '12 at 18:49
above, im confused how the codepad script works and on my server it doesnt :/ – Shawn Nov 25 '12 at 18:51

JP_ · Answer 4 · 2012-11-25T19:25:19.253

Try using cURL to get the data and then load it into a DOMDocument:

<?php
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument();
@$dom->loadHTML($data); //The @ is necessary to suppress invalid markup

echo $dom->getElementsByTagName('span')->item(2)->nodeValue;

if (0 != $element->length) 
{
    $content = trim($element->item(2)->nodeValue);
    if (empty($content)) 
    {
        $content = trim($element->item(2)->textContent);
    }
    echo $content . "\n";
}

?>

code not parsing through a simple google.com test

4 Answers4