3

I know (from the answer to this question: .rar, .zip files MIME Type) that that most people check zip files in PHP as application/zip or application/octet-stream, but I have a couple of questions about this:

  • is it safe just to check for application/octet-stream (given that application/octet-stream can be used to describe many more file types than just zip!). I know I could check the file in other ways too, but thought I should try and keep everything as simple as possible
  • I've tried to check for as many different actual zip types as possible; but, there are some which give some unexpected results. I've found 1 for which the mime-type is application/x-external-editor, but PHP has problems dealing with it (although the only error I get is Warning: ZipArchive::close() [ziparchive.close]: Invalid or unitialized Zip object) - is this documented anywhere? Is there a list of actual x- mimetypes which PHP can cope with?

Edit

In answer to the questions below:

  • I'm checking the mime type by using $_FILES['fileatt']['type'], but using mime_content_type() gives the same result. Different zip files seem to be any one of the following: 'application/zip', 'application/x-compressed', 'application/x-zip-compressed', 'application/x-compressed', 'multipart/x-zip'. I didn't understand why I got an error when the mime type was detected as being application/x-external-editor.
  • I have got the zip extension installed, and I am extracting all the files from the zip files when they are uploaded. I hadn't thought about checking the error.

I have also found another thing I don't quite understand: when I use the following code with a file which PHP reads as application/x-external-editor:

if($zip->open($_FILES[fileatt]['tmp_name'])===TRUE)
{
    echo "success";
} else {
    echo "error";
} 

prints "error", but checking the file type as

$res = $zip->open($_FILES[fileatt]['tmp_name']);
if($res)
{
    echo "success";
} else {
    echo "error";
} 

prints "success"; in this code, I assume that the boolean is effectively using ==, not ===, but why should this make a difference?

The error:

$res = $zip->open($_FILES[fileatt]['tmp_name']);
if($res===TRUE)
{
    echo "success";
} else {
    echo $res;
} 

prints 19 - which error (http://uk3.php.net/manual/en/ziparchive.open.php) does 19 refer to?!

Community
  • 1
  • 1
ChrisW
  • 4,970
  • 7
  • 55
  • 92
  • in answer to your first question: no, it's not safe and it won't tell you anything about the file. How are you checking this, different zip files shouldn't give you different results unless they really are zip files. PHP can handle anything given the right library. Show your code on how you're detecting the mime type. – Cfreak May 11 '12 at 20:39

1 Answers1

3

Never trust the mime type, this can be easily spoofed by the client. They could submit an exe and give it a mime type of text/plain if they wanted to.

All zip files begin with a standard local file header signature (0x04034b50) so you could check that the first 4 bytes of the file match the zip signature bytes. See the PKZIP Appnote for more details.

If you have the zip extension enabled, you can go even further and attempt to open and read the zip to make sure it is a fully valid zip file.

Something like this works well:

$zip = zip_open('/path/to/file.zip');
if (is_int($zip)) {
    echo "Error $zip encountered reading the file, is it a valid zip?";
} else {
    echo "Thanks for uploading a valid zip file!";
}

zip_open returns a resource if opened successfully, otherwise an integer representing the error that occurred reading the file.

EDIT: To elaborate on some of your questions:

About application/octet-stream: This is as you said, a very generic type. This just means any file that contains 8-bit data which is basically everything and anything. application/zip is the de-facto standard mime-type, but some clients will use other values as you have discovered. Also given the fact that a client can easily spoof any file type to use application/zip I wouldn't rely on $_FILES['fileatt']['type'] since it can be anything.

AFIK, mime_content_type() simply looks at the file extension and maps it to a mime type from a mime.types file on the system or built into PHP. If someone put a .zip extension on an exe file it would still register as application/zip. I beleive certain extensions may examine the file header.

Zip::open() returns TRUE if the file was opened successfully, or an integer error code. Therefore, == will give you a false positive on an error because any non-zero integer will evaluate to true using == since it will cast a non-zero integer to TRUE. If you are going to check the return from Zip::open you should always use $res === true in order to check for success. You can find the meanings of the error codes here in the comment at the bottom of the page.

Bottom Line: Since you said you are already extracting the zip, it may be less of a bother to validate based on the mime type, but instead it would be easier to just attempt to open the file and go based on the return value of open. If it returns true, you can figure the file is a valid zip (there could of course be errors later in the file, but they at least uploaded something resembling a zip file).

Hope that helps you out.

drew010
  • 68,777
  • 11
  • 134
  • 162
  • Thanks for the suggestions - I've updated my question with some more info too.. :) – ChrisW May 12 '12 at 19:00
  • @ChrisW I just updated the answer, hopefully that addresses your questions more. – drew010 May 12 '12 at 21:27
  • thanks very much. 1 more quick (although probably stupid!) question: `Zip::open` apparently returns a `resource` or an integer according to the docs. The `===` operator checks for type as well as equality, but why is a resouce the same type as the boolean value `true` if an integer isn't? – ChrisW May 13 '12 at 00:05
  • Yeah that is confusing, but the object oriented interface returns boolean true on successful open and int on failure, and the procedural interface returns a resource (`=== true` will not work) on success and integer on failure. – drew010 May 13 '12 at 03:16