4

I'm not a stranger to mime types but this is strange. Normally, a text file would have been considered to be of text/plain mime but now, after implementing fileinfo, this type of file is now considered to be "text/x-pascal". I'm a little concerned because I need to be sure that I get the correct mime types set before allowing users to upload with it.

Is there a cheat sheet that will give me all of the "common" mimes as they are interpreted by fileinfo?


Sinan provided a link that lists all of the more common mimes. If you look at this list, you will see that a .txt file is of text/plain mime but in my case, a plain-jane text file is interpreted as text/pascal.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
Jim
  • 41
  • 3

4 Answers4

4

fileinfo is a "best guess". It analyzes only a portion of the file in order to try to figure out what type the file is, and as such it can be fooled easily enough. Perhaps your file starts with a Pascal comment or keyword such as Project or Unit.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
3

Fileinfo is not using the extension of the file to determine which mime-type it is, but (quoting) :

The functions in this module try to guess the content type and encoding of a file by looking for certain magic byte sequences at specific positions within the file.

The idea being that the name à of the file, and its extension, are provided by the users (especially in a case such as yours, where the files are being uploaded by users), and, as such, are less "sure" than the content of the file itself.


Maybe a solution could be to not check on the whole mime-type returned by fileinfo, but to only use the first part of it -- at least in some cases ?

For instance, maybe you could accept all mimetype that are in the text/* and image/* families, and refuse all those look like application/*, except for application/pdf ?
(Just an example -- but you see the point)

Pascal MARTIN
  • 395,085
  • 80
  • 655
  • 663
3

I have found that, as of at least version 5.03, the 'file' command can in some circumstances mis-identify a plain text file as a Pascal source file, simply because it contains the word 'program' or 'record'. At least that's how it looks having examined the source (src/names.h). I believe the php fileinfo command uses the same 'magic' engine, so I suspect this is the cause of the problem. If/when I am accepted on the file mailing list, I will notify the maintainers of this issue.

[UPDATE] I asked the question, but got little in the way of a response. Having investigated this issue a bit more throughly, it turns out that identifying text formats is, in general, really difficult. If you get a 'text/*' MIME type back from file, you might want to consider ignoring the result and assuming the resource is just 'text/plain', unless the false negatives (text/html maybe) will cause you difficulties.

Community
  • 1
  • 1
Andy Jackson
  • 356
  • 3
  • 13
2

There is a chart that shows a list of common MIME types and their corresponding extensions. Here

Bazindrix
  • 1,041
  • 8
  • 8
Sinan
  • 5,819
  • 11
  • 39
  • 66
  • Thank you very much Sinan. Going there now. – Jim Feb 27 '10 at 01:41
  • On that link that you gave, for example, text/pascal is associated with the .pas extension. In my case, a plain text file is being interpreted as text/pascal for some weird reason. – Jim Feb 27 '10 at 01:43