2

I need to find what kind of file user have upload by checking binary data, and I found perfect solution for that, over here

Just to be specific this is the function I'm using:

function getImgType($filename) {
    $handle = @fopen($filename, 'r');
    if (!$handle)
        throw new Exception('File Open Error');

    $types = array('jpeg' => "\xFF\xD8\xFF", 'gif' => 'GIF', 'png' => "\x89\x50\x4e\x47\x0d\x0a", 'bmp' => 'BM', 'psd' => '8BPS', 'swf' => 'FWS');
    $bytes = fgets($handle, 8);
    $found = 'other';

    foreach ($types as $type => $header) {
        if (strpos($bytes, $header) === 0) {
            $found = $type;
            break;
        }
    }
    fclose($handle);
    return $found;
}

Now my question is, how can I get bits for other file types, like .zip, .exe, mp3, mp4 etc... if there is some kind of list somewhere out there it would be great, though I would like to extract it myself and learn how all of this really works.

Community
  • 1
  • 1
Linas
  • 4,380
  • 17
  • 69
  • 117
  • 2
    Have you tried with [`finfo_file`](http://www.php.net/manual/en/function.finfo-file.php) ? And by the way, why don't you use this function to var_dump $bytes for zip, exe, etc ... ? – j0k Nov 22 '12 at 19:21
  • [finfo_file (english version)](http://www.php.net/manual/en/function.finfo-file.php) – GolezTrol Nov 22 '12 at 19:23
  • It is only supported in `PHP >= 5.3.0` I wold like to support lover versions as well. – Linas Nov 22 '12 at 19:23
  • I would strongly advice not to use older PHP versions, for security reasons – Dirk McQuickly Nov 22 '12 at 19:26
  • This may help: http://www.garykessler.net/library/file_sigs.html – Aziz Nov 22 '12 at 19:26

3 Answers3

4

What you're looking for is called file magic number.

The magic number is a type of file signature - since sometimes it takes more than the magic number to identify the file.

A (very) short list of such numbers can be found here. A larger list can be found here.

File identification websites often times also mention the file magic number.

In linux, the file command can be used to identify files. In PHP you can use the FileInfo set of functions to identify files.


By the way, you did not specify the kind of files you want to identify. Sometimes, identification might be the wrong solution. For example, people used to want to identify files before passing them to GD or saving them on the server as images. In this case, identification is not really your job. Instead, use the following code:

$data = file_get_contents('data.dat'); // File might eventcontain a JPG...it is
                                       // still loaded without problems!
$image = imagecreatefromstring($data); // ... since this function just needs the
                                       // file's data, nothing more.
Christian
  • 27,509
  • 17
  • 111
  • 155
  • Well, it's always a race to the best answer, isn't it? :) – Christian Nov 22 '12 at 19:31
  • So if i understood it correctly `mp3` magic number would be `49 44 33`? – Linas Nov 22 '12 at 19:43
  • Yes, but keep in mind that not every file that starts with `49 44 33` is an actual MP3 file. – Aziz Nov 22 '12 at 19:45
  • Hmm with my function that i use right now it doesn't find those number, instead it finds `ID3`, it's a little confusing – Linas Nov 22 '12 at 19:48
  • Oh, it's `\x49\x44\x33`. Those are the bytes, represented in hexadecimal, equivalent to `ID3` in ASCII. – Aziz Nov 22 '12 at 20:03
  • Oh now this makes sense, so last question, is there a difference if i use hex or ASCII? – Linas Nov 22 '12 at 20:12
  • @Linas Hex, decimal, octal etc are all number systems. They are interchangeable. For example, "20" in hex is "32" in decimal. ASCII on the other hand, is a character set, that is, a byte of decimal value 32 is a space character in ASCII. – Christian Nov 22 '12 at 20:22
  • [ID3 is the name](http://en.wikipedia.org/wiki/ID3) of the meta data record of MP3 files. It contains meta information about the file, like title, album, genre. So ID3 may actually be the header of an ID3 record. It could be that maybe not all MP3 files have an ID3 record, and it could also be that other (music) file do have an ID3 record, even though they're not MP3 files. – GolezTrol Nov 22 '12 at 21:43
3

What you're looking for is called "File Signatures", "Magic Bytes", or "Magic Numbers".

This page lists a lot of them for many file formats

However, I wouldn't rely on them for identifying file formats. Use PHP's finfo_file instead.

Aziz
  • 20,065
  • 8
  • 63
  • 69
2

Most files have a specific header or file signature or (apparently) magic number, which are different names for the same thing: a fixed set of bytes at the start of the file.

For instance, .exe starts with 'MZ', .zip has a fixed 4 byte sequence

This webpage contains a lot of file signatures: http://www.garykessler.net/library/file_sigs.html

If you search for .extension file format or .extension file header, you will usually find a description of the file format.

Christian
  • 27,509
  • 17
  • 111
  • 155
GolezTrol
  • 114,394
  • 18
  • 182
  • 210