27

I've a web page that that can be used to upload files.
Now I need to check if the file type is correct (zip, jpg, pdf,...).

I can use the mimeType that comes with the request but I don't trust the user and let's say I want to be sure that nobody is able to upload a .gif file that was renamed in .jpg
I think that in this case I should inspect the magic number.
This is a java library I've found that seems to achieve what I need 'extract the mimetype from the magic number'.
Is this a correct solution or what do you suggest?

UPDATE: I've found the mime-util project and it seems very good and up-to-date! (maybe better then Java Mime Magic Library?)
Here is a list of utility projects that can help you to extract mime-types

Gray
  • 115,027
  • 24
  • 293
  • 354
mickthompson
  • 5,442
  • 11
  • 47
  • 59

3 Answers3

24

Try Java Mime Magic Library

byte[] data = ...
MagicMatch match = Magic.getMagicMatch(data);
String mimeType = match.getMimeType();
sfussenegger
  • 35,575
  • 15
  • 95
  • 119
  • 1
    It does not detect docx files correctly.. it keeps giving application/zip as mimetype... – Oscar Pérez Feb 07 '13 at 09:53
  • 1
    @OscarPérez A docx is indeed a zip archive containing a bunch of XML files, so it's technically correct. You could inspect the archive yourself to see if it is a docx or similar. This would probably be out of scope for this small library. – sfussenegger Feb 18 '13 at 14:57
  • @sfussenegger What can you say about this SO question [check file of MIME-type with JMimeMagic](http://stackoverflow.com/questions/15325047/check-file-of-mime-type-with-jmimemagic)? – catch23 Mar 11 '13 at 15:09
10

I'm sure the library posted by @sfussenegger is the best solution, but I do it by hand with the following snippet that I hope it could help you.

DESCONOCIDO("desconocido", new byte[][] {}), PDF("PDF",
            new byte[][] { { 0x25, 0x50, 0x44, 0x46 } }), JPG("JPG",
            new byte[][] { { (byte) 0xff, (byte) 0xd8, (byte) 0xff,
                    (byte) 0xe0 } }), RAR("RAR", new byte[][] { { 0x52,
            0x61, 0x72, 0x21 } }), GIF("GIF", new byte[][] { { 0x47, 0x49,
            0x46, 0x38 } }), PNG("PNG", new byte[][] { { (byte) 0x89, 0x50,
            0x4e, 0x47 } }), ZIP("ZIP", new byte[][] { { 0x50, 0x4b } }), TIFF(
            "TIFF", new byte[][] { { 0x49, 0x49 }, { 0x4D, 0x4D } }), BMP(
            "BMP", new byte[][] { { 0x42, 0x4d } });

Regards.

PD: The best of it is that it doesn't have any dependency. PD2: No warranty about it's correctness! PD3: "desconocido" stands for "unknown" (in spanish)

ATorras
  • 4,073
  • 2
  • 32
  • 39
-2

The activation framework is Sun's answer to this. And you may well have this already in the classpath of your app server

James B
  • 3,692
  • 1
  • 25
  • 34
  • I tried activation framework's getContentType() over some .pdf, .xls files but unfortunately the method is always returning 'application/octet-stream'. only for .txt is giving something like 'text/plain' – mickthompson Dec 16 '09 at 16:35
  • 1
    actually the getContentType only maps the file based on the file extension and a map of mimeType that you provide... this is not what I'm looking for – mickthompson Dec 16 '09 at 16:44
  • I agree, that's not what you're looking for! – James B Dec 17 '09 at 13:44
  • 5
    Linking to an IP address is weird. – blong Sep 25 '13 at 18:19