cURL not getting HTML source of URL

Question

I am trying to make a simple web crawler with PHP and I am having issues getting the HTML source of a given URL. I am currently using cURL to get the source.

My code:

 $url = "http://www.nytimes.com/";

    function url_get_contents($Url) {
        if (!function_exists('curl_init')) {
            die('CURL is not installed!');
        }
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $Url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $output = curl_exec($ch);
        if ($output === false) { die(curl_error($ch)); }
        curl_close($ch);
        return $output;
    }

    echo url_get_contents($url);
    ?>

Right now nothing gets echoed and there aren't any errors, so it is a bit of a mystery. Any suggestions or fixes will be appreciated

Edit: I added

if ($output === false) { die(curl_error($ch)); }

to the middle of the function and it ended up giving me an error (finally!):

Could not resolve host: www.nytimes.com

I still do not really know what the problem is. Any ideas?

Thanks

you never bothered checking if curl succeeded. `if ($output === false) { die(curl_error($ch)); }` — Marc B, Jun 25 '15 at 21:21
http://stackoverflow.com/questions/6516902/how-to-get-response-using-curl-in-php should help. — Scalable, Jun 25 '15 at 21:22
Probably nytimes.com has something to prevent web crawling. Have you tried with a different url? — Alvaro Flaño Larrondo, Jun 25 '15 at 22:58
@AlvaroFlañoLarrondo False. `curl -i http://www.nytimes.com/` returns an `HTTP/1.1 200` response. — Asaph, Jun 25 '15 at 23:47

score 2 · Accepted Answer · answered Jun 25 '15 at 23:47

Turns out that it was not a cURL problem

My host server (Ubuntu VM) was working off of a "host-only" network adapter which blocked access to all other IPs or domains outside of it's host machine making it impossible for cURL to connect to URLs.

Once it was changed to "bridged" network adapter I had access to the outside world.

Hope this helps.

score 0 · Answer 2 · answered Jun 25 '15 at 21:24

0

Variable case mismatch ($url vs. $Url). Change:

function url_get_contents($Url) {

to

function url_get_contents($url) {

answered Jun 25 '15 at 21:24

Asaph

159,146
25
197
199

The two variables are used in different context, inside and outside the function. Plus the edited question shows that the url is corectly read. – Alvaro Flaño Larrondo Jun 25 '15 at 22:56
1

@AlvaroFlañoLarrondo This answer was posted prior to the question edit at a time where the variable names *did not align within the function*. I was keenly aware that there are 2 variables in two different contexts. – Asaph Jun 25 '15 at 23:44

cURL not getting HTML source of URL

2 Answers2