0

I use file_get_contents to fetch remote pages. Many of pages return 404 error, with a customized (and heavy 404 page)

Is there a way to stop and not download the whole page when 404 header is found?

(maybe curl or wget can do that ?)

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
yarek
  • 11,278
  • 30
  • 120
  • 219

2 Answers2

2

No, this isn't possible.

HTTP provides some scope for conditional requests (such as If-Modified-Since), but none that trigger on the status code.

The closest you could come would be to make a HEAD request and then, if you don't get an error code back, make a GET request afterwards. You'd probably lose more to having two requests for every good resource than you would gain in not getting the bodies of bad resources.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • http://stackoverflow.com/questions/1545432/what-is-the-easiest-way-to-use-the-head-command-of-http-in-php – Quentin Nov 16 '15 at 15:18
0

I would do the following:

$pageUrl = "http://www.example.com/myfile/which/may/not.exist";
$headers = get_headers($pageUrl);
//check header before downloading
if($headers[0] == "HTTP/1.1 200 OK"){
  //OK - download
  $download = file_get_contents($pageUrl);
}else if($headers[0] == "HTTP/1.1 404 NOT FOUND"){
  //NOT OK - show error
}

you could also do a indexof instead.

based on PHPs manual page for get_headers

Sample output:

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Sat, 29 May 2004 12:28:13 GMT
    [2] => Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
    [4] => ETag: "3f80f-1b6-3e1cb03b"
    [5] => Accept-Ranges: bytes
    [6] => Content-Length: 438
    [7] => Connection: close
    [8] => Content-Type: text/html
)
JoSSte
  • 2,953
  • 6
  • 34
  • 54