I use file_get_contents
to fetch remote pages.
Many of pages return 404 error, with a customized (and heavy 404 page)
Is there a way to stop and not download the whole page when 404 header is found?
(maybe curl or wget can do that ?)
I use file_get_contents
to fetch remote pages.
Many of pages return 404 error, with a customized (and heavy 404 page)
Is there a way to stop and not download the whole page when 404 header is found?
(maybe curl or wget can do that ?)
No, this isn't possible.
HTTP provides some scope for conditional requests (such as If-Modified-Since
), but none that trigger on the status code.
The closest you could come would be to make a HEAD
request and then, if you don't get an error code back, make a GET
request afterwards. You'd probably lose more to having two requests for every good resource than you would gain in not getting the bodies of bad resources.
I would do the following:
$pageUrl = "http://www.example.com/myfile/which/may/not.exist";
$headers = get_headers($pageUrl);
//check header before downloading
if($headers[0] == "HTTP/1.1 200 OK"){
//OK - download
$download = file_get_contents($pageUrl);
}else if($headers[0] == "HTTP/1.1 404 NOT FOUND"){
//NOT OK - show error
}
you could also do a indexof instead.
based on PHPs manual page for get_headers
Sample output:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Sat, 29 May 2004 12:28:13 GMT
[2] => Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
[3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
[4] => ETag: "3f80f-1b6-3e1cb03b"
[5] => Accept-Ranges: bytes
[6] => Content-Length: 438
[7] => Connection: close
[8] => Content-Type: text/html
)