2

I'm trying to script the download of a large file, but the script generates a 404 error.

// Get file and save to raw.csv
$url = 'http://spam-ip.com/csv_dump/spam-ip.com_'.date("m-d-Y").'.csv';
//File to save the contents to
$fp = fopen ('raw.csv', 'w+');
//Here is the file we are downloading, replace spaces with %20
$ch = curl_init(str_replace(" ","%20",$url));
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);//get curl response
//done
curl_close($ch);
fclose($fp);

The remote file exists (it's updated every day throughout the day), and I can access it directly via browser. But attempts to access via curl or file_get_contents() (I've tried it both ways) produce 404 errors. Any suggestions on a fix?

sanitycheck
  • 286
  • 3
  • 16

1 Answers1

0

Try this and let me know how it works. It should at least give you a bit of debugging to make sure that the file exists, and what the curl error is.

function urlExists($url) {
    if (! $fp = curl_init($url)) return false;
    return true;
}

// URLEncode instead of replacing spaces...
$url = urlencode('http://spam-ip.com/csv_dump/spam-ip.com_'.date("m-d-Y").'.csv');

if (urlExists($url)) 
{
    $ch = curl_init($url);
    $timeout = 5;
    curl_setopt_array($ch, array(
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_CONNECTTIMEOUT => $timeout,
        CURLOPT_HEADER         => 0,
        CURLOPT_MAXREDIRS      => 2 )
    );

    $data = curl_exec($ch);

    if (curl_errno($c)) 
    {
        die('Error: ' . curl_error($c)); // exits on fail...
    }

    curl_close($ch);

    $fp = fopen('raw.csv', 'w+');
    fwrite($fp, $data);
    fclose($fp);

} else {
    echo "The url: $url does not exist..."
}
sanitycheck
  • 286
  • 3
  • 16
ehime
  • 8,025
  • 14
  • 51
  • 110
  • 1
    I got some errors, and I was able to resolve a couple of them. The array wasn't defined in curl_setopt_array, so I defined it. I also added a semi-colon after the $data statement. But now I'm getting, "Error: Couldn't resolve host 'http%3A%2F%2Fspam-ip.com%2Fcsv_dump%2Fspam-ip.com_12-19-2014.csv'" – sanitycheck Dec 19 '14 at 20:44
  • Since the remote URL usually doesn't include any spaces, I decided to try taking urlencode out of the script. The result: the same 404 error. The script never displays the Error message. I did notice one other detail. Before the 404 error, the page generates a 500 error. – sanitycheck Dec 20 '14 at 03:19