15

Suppose you have a thumbnail generator script that accepts source images in the form of a URL. Is there a way to detect if the source URL is "broken" - whether nonexistent or leads to an non-image file?


Just brute force using getimagesize() or another PHP GD function is not a solution, since spoofed stray URL's that might not be images at all (http://example.com/malicious.exe or the same file, but renamed as http://example.com/malicious.jpg) could be input - such cases could easily be detected by PHP before having to invoke GD. I'm looking for GD pre-sanitizing before having GD try its battalion at parsing the file.


as a first step, the following regular expression checks if the URL is an image extension: preg_match('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)([^\s]+(\.(?i)(jpg|png|gif|bmp))$)@', $txt,$url);

ina
  • 19,167
  • 39
  • 122
  • 201
  • 1
    what's wrong with malicious.exe renamed to malicious.jpg? Tip: there is nothing wrong. in the context of thumbnails creation it's as dangerous as a plain text file. – Your Common Sense Sep 10 '10 at 17:39
  • 2
    `getimagesize()` is neither "brute force" nor a GD function. Itis exactly what you are looking for judging by what you are saying in the second paragraph. – Pekka Sep 10 '10 at 20:46
  • 1
    How would you handle *changing* references? The URL might be a valid image when you create the thumb, but then might be changed later on. – Arjan Sep 12 '10 at 14:16

9 Answers9

11

use file_exists function in php, you can check urls with it.

See documentation below, shows how to check img... exactly what you need

FILE EXISTS - http://www.php.net/manual/en/function.file-exists.php#93572

URL EXISTS - http://www.php.net/manual/en/function.file-exists.php#85246


Here is alternative code for checking the url. If you will test in browser replace \n with <br/>

<?php

$urls = array('http://www.google.com/images/logos/ps_logo2.png', 'http://www.google.com/images/logos/ps_logo2_not_exists.png');

foreach($urls as $url){
   echo "$url - ";
   echo url_exists($url) ? "Exists" : 'Not Exists';
   echo "\n\n";
}


function url_exists($url) {
    $hdrs = @get_headers($url);

    echo @$hdrs[1]."\n";

    return is_array($hdrs) ? preg_match('/^HTTP\\/\\d+\\.\\d+\\s+2\\d\\d\\s+.*$/',$hdrs[0]) : false;
}
?>

Output is as follows

http://www.google.com/images/logos/ps_logo2.png - Content-Type: image/png
Exists

http://www.google.com/images/logos/ps_logo2_not_exists.png - Content-Type: text/html; charset=UTF-8
Not Exists
Alex
  • 6,441
  • 2
  • 25
  • 26
  • 2
    I use this all the time for detecting includes in my home-brew MVC setup. Works great and you can easily point to a default not-found image if the file doesn't exist. – smdrager Sep 09 '10 at 15:24
  • 1) `file_exists` can only check for local files (?) 2) also.. url_exists looks like it is just invoking `curl_exec` -- does `curl_exec` return false if the $url is "broken" or (more specifically) is the wrong header type? – ina Sep 09 '10 at 15:36
  • @ina see modification to my comment. I added url_exists method and example. Hope this helps you. – Alex Sep 09 '10 at 16:19
  • get_headers would be loing, use CURL with Multi Init – RobertPitt Sep 10 '10 at 16:41
7

I have used the following to detect attributes for remote images

$src='http://example.com/image.jpg';
list($width, $height, $type, $attr) = @getimagesize($src);

example (checking stackoverflows "Careers 2.0" image)

$src='http://sstatic.net/ads/img/careers2-ad-header-so.png';
list($width, $height, $type, $attr) = @getimagesize($src);

echo '<pre>';
echo $width.'<br>';
echo $height.'<br>';
echo $type.'<br>';
echo $attr.'<br>';
echo '</pre>';

If $height, $width etc is null the image is obvious not an image or the file does not exists. Using cURL is overkill and slower (even with CURLOPT_HEADER)

4

The only really reliable way is to request the image using file_get_contents(), and finding out its image type using getimagesize().

Only if getimagesize() returns a valid file type, can you rely that it is in fact a valid image.

This is quite resource heavy, though.

You could consider not doing any server-side checks at all, and adding an onerror JavaScript event to the finished image resource:

<img src="..." onerror="this.style.display = 'none'">
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • what if the file is not an image, i.e, renamed text file or other file? – ina Sep 10 '10 at 01:38
  • it's all in vain. he's refused to understand – Your Common Sense Sep 10 '10 at 02:25
  • @ina in that case, `getimagesize` will fail and you will know it is not an image. – Pekka Sep 10 '10 at 05:07
  • you're assuming too much of the php gd installation. on the current server, it will try to parse the file regardless, even if it's an .exe file or something - this is extra parsing in invoking gd when you can first check headers or extension to weed out such strays. – ina Sep 10 '10 at 16:00
  • also, why use `file_get_contents()` when `curl()` has caching built in? – ina Sep 10 '10 at 17:35
  • @ina you have no idea what you are talking about. `getimagesize()` has nothing to do with GD. It checks the *file format headers* (i.e. the first few bytes of the file) which is as good as a GD check, but without the overhead. It is the only waterproof and performance conscious way to parse an image file, take it or leave it. Regarding `curl` - of course you can use that instead of file_get_contents(), but I don't really see the point, especially about caching. What good is caching when you are trying to validate a resource? – Pekka Sep 10 '10 at 20:25
2
try for local files

<?php 
if(file_exists($filename))
{
//do what you want
}
else
{
//give error that file does not exists
}
?>

for external domains

$headers = @get_headers($url);
if (preg_match("|200|", $headers[0])) {
// file exists
} else {
// file doesn't exist
}

Also you can use curl request for the same.

Luke Stevenson
  • 10,357
  • 2
  • 26
  • 41
Nik
  • 4,015
  • 3
  • 20
  • 16
  • this only works for local files - what if it's pointing to an external domain `http://anotherdomain.com/image.jpg` – ina Sep 09 '10 at 16:02
  • @ina, i forgot to add below code.Thanks for pointing the same. – Nik Sep 13 '10 at 07:33
1

Fast Solution for broken or not found images link
i suggest you that don't use getimagesize() because it will 1st download image then it will check images size+if this will not image then it will throw exception so use below code

if(checkRemoteFile($imgurl))
{
//found url, its mean
echo "this is image";
}

function checkRemoteFile($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$url);
    // don't download content
    curl_setopt($ch, CURLOPT_NOBODY, 1);
    curl_setopt($ch, CURLOPT_FAILONERROR, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    if(curl_exec($ch)!==FALSE)
    {
        return true;
    }
    else
    {
        return false;
    }
}

Note: this current code help you to identify broken or not found url image this will not help you to identify image type or headers

Hassan Saeed
  • 6,326
  • 1
  • 39
  • 37
0

You could check the HTTP status code (it should be 200) and the Content-type header (image/png etc.) of the HTTP response before you put the actual image through the generator.

If these two preconditions are ok, after retrieving the image you can call getimagesize() on it and see if it breaks, what MIME type it returns etc.

Alex Ciminian
  • 11,398
  • 15
  • 60
  • 94
0

did you try file_get_contents() method?

http://php.net/manual/en/function.file-get-contents.php

pMan
  • 8,808
  • 11
  • 32
  • 35
  • this will create unnecessary data transfered for checking if file exists. better just to get a header, less work for server. – Alex Sep 09 '10 at 18:20
  • 1
    @Alex: Given the OP's determination to detect faked MIME headers, there is no way *without* file_get_contents. +1 to even out the score – Pekka Sep 10 '10 at 05:09
0

onerror Event

Execute a JavaScript if an error occurs when loading an image:

The onerror event is triggered if an error occurs while loading an external file (e.g. a document or an image).

Example:

<!DOCTYPE html>
<html>
<body>

<img src="image.gif" onerror="myFunction()">

<p>A function is triggered if an error occurs when loading the image. The function shows an alert box with a text.
In this example we refer to an image that does not exist, therefore the onerror event occurs.</p>

<script>
function myFunction() {
  alert('The image could not be loaded.');
}
</script>

</body>
</html>
Purvik Dhorajiya
  • 4,662
  • 3
  • 34
  • 43
0

Although not detecting broken links, might be useful for someone else...

onerror=""

<img src="PATH" onerror="this.src='NEW PATH'" />
James Osguthorpe
  • 143
  • 3
  • 12