2

I am trying to get the page contents from the remote site. It is working for many sites. But some of the urls like http://www1.macys.com/ returns nothing. Can anyone please tell me the solution or what the problem is? Am I miss anything?

If I am using fopen() or file_get_contents() it shows the warning "Redirection limit reached, aborting"

Below is my code.

<?php
    $url = 'http://www1.macys.com/shop/product/volcom-stripe-thermal-shirt?ID=1155481&CategoryID=30423#fn=sp%3D1%26spc%3D996%26ruleId%3D27%26slotId%3D1';

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0');
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);

    $contents = curl_exec($ch);

    if(curl_errno($ch)) {
        echo 'Error: ' . curl_error($ch) . '<br><br>';
    }

    echo 'Contents: '; print_r($contents); echo '<br><br>';
    curl_close($ch);
?>
Wooble
  • 87,717
  • 12
  • 108
  • 131
user3350854
  • 55
  • 1
  • 1
  • 6
  • Not sure what do you mean by trying to get images, if you just open the URL on browser it will show you HTML content it does not look like returning some data like JSON,XML etc to parse and get data – Abhik Chakraborty Feb 25 '14 at 11:45

4 Answers4

2

maybe it's a redirect issue.. try to add this:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

this options let cUrl follows the redirects

edit:

Add also this:

curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__).DIRECTORY_SEPERATOR.'cookie.txt');

Remember to set permissions of cookie.txt to 777

Alberto Fecchi
  • 1,705
  • 12
  • 27
  • When I am using "curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);" I got the error "Maximum (20) redirects followed" – user3350854 Feb 25 '14 at 11:57
  • add this line: curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__).DIRECTORY_SEPERATOR.'cookie.txt'); and remember to set the cookie.txt to chmod 777 – Alberto Fecchi Feb 25 '14 at 12:02
2

Some websites won't feed images unless you maintain a cookie jar.

Try this: (from: https://stackoverflow.com/a/12885587/2167896)

$jar = tmpfile();
$output = fetch('www.google.com', $jar)
function fetch( $url, $z=null ) {
            $ch =  curl_init();

            $useragent = isset($z['useragent']) ? $z['useragent'] : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2';

            curl_setopt( $ch, CURLOPT_URL, $url );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
            curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
            curl_setopt( $ch, CURLOPT_POST, isset($z['post']) );

            if( isset($z['post']) )         curl_setopt( $ch, CURLOPT_POSTFIELDS, $z['post'] );
            if( isset($z['refer']) )        curl_setopt( $ch, CURLOPT_REFERER, $z['refer'] );

            curl_setopt( $ch, CURLOPT_USERAGENT, $useragent );
            curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, ( isset($z['timeout']) ? $z['timeout'] : 5 ) );
            curl_setopt( $ch, CURLOPT_COOKIEJAR,  $z['cookiefile'] );
            curl_setopt( $ch, CURLOPT_COOKIEFILE, $z['cookiefile'] );

            $result = curl_exec( $ch );
            curl_close( $ch );
            return $result;
    }
Community
  • 1
  • 1
Michael Benjamin
  • 2,895
  • 1
  • 16
  • 18
0

If code works with other URLs, then it can happen that specific server is blocking your curl requests. Try fopen().

Or add appropriate headers and referer, this is what I have used:

    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; //browsers keep this blank.
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $contents = curl_exec($ch);
Revolution88
  • 688
  • 5
  • 17
0

try to add 'USERAGENT' which is your api username, website name or something else:

curl_setopt($ch, CURLOPT_USERAGENT, 'MY-NAME');