13

I'm developing a file upload with JSF. The application saves three dates about the file:

  • Filename
  • Bytes
  • Content-Type as submitted by the browser.

My problem is that some files are saved with content type = application/octet-stream even if they are *.doc files oder *.pdf.

When does the browser submits such a content type?
I would like to clean up the database so I need to know when the browser information are incorrect.

guerda
  • 23,388
  • 27
  • 97
  • 146
  • 1
    Hmm ... I can't make Firefox use a bad MIME type even if I mess up my system mime.types file, so I'm not sure what the browsers might be doing to pass a Content-type header. – Pointy Mar 11 '10 at 16:46
  • @Pointy: Unfortunately there are more browsers in the world than only FF. For example the one developed by (cough) a team in Redmond. – BalusC Mar 11 '10 at 16:56
  • Yes of course - oddly enough I'd expect IE to get the MIME type wrong, but not in that way. (I'd expect it to provide "application/pdf" for a JPEG file whose name happened to be "bogus.pdf", for example.) – Pointy Mar 11 '10 at 17:01
  • 1
    Oh, and in the file upload case I'm recently familiar with (my own app), I pay no attention to that and use a server-side sniffer (Image Magick, in this case) to determine file type. – Pointy Mar 11 '10 at 17:02
  • possible duplicate of [How is mime type of an uploaded file determined by browser?](http://stackoverflow.com/questions/1201945/how-is-mime-type-of-an-uploaded-file-determined-by-browser) – Ciro Santilli OurBigBook.com Feb 17 '15 at 15:35

2 Answers2

9

Ignore the value sent by the browser. This is indeed dependent on the client platform, browser and configuration used.

If you want full control over content types based on the file extension, then better determine it yourself using ServletContext#getMimeType().

String mimeType = servletContext.getMimeType(filename);

The default mime types are definied in the web.xml of the servletcontainer in question. In for example Tomcat, it's located in /conf/web.xml. You can extend/override it in the webapp's /WEB-INF/web.xml as follows:

<mime-mapping>
    <extension>xlsx</extension>
    <mime-type>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime-type>
</mime-mapping>

You can also determine the mime type based on the actual file content (because the file extension may not per se be accurate, it can be fooled by the client), but this is a lot of work. Consider using a 3rd party library to do all the work. I've found JMimeMagic useful for this. You can use it as follows:

String mimeType = Magic.getMagicMatch(file, false).getMimeType();

Note that it doesn't support all mimetypes as reliable. You can also consider a combination of both approaches. E.g. if the one returns null or application/octet-stream, use the other. Or if both returns a different but "valid" mimetype, prefer the one returned by JMimeMagic.

Oh, I almost forgot to add, in JSF you can obtain the ServletContext as follows:

ServletContext servletContext = (ServletContext) FacesContext.getCurrentInstance().getExternalContext().getContext();

Or if you happen to use JSF 2.x already, use ExternalContext#getMimeType() instead.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
2

It depends on the OS, the browser, and how the user has configured them. It's based on the way the browser determines the file type of local files (to display them). On most OS/browser combinations this is based on the file's extension, but on some it may be determined by other means. (eg: on Mac OS)

In ay case, you shouldn't really rely on the Content-type sent by the browser. The best approach would be to actually look at the contents of the file. You could probably also use the filename, but keep in mind that browsers aren't necessarily going to be good about telling you that either (though it's probably still a lot more reliable than the Content-type they send).

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299