HowTo extract MimeType from a byte[]

Question

I've a web page that that can be used to upload files.
Now I need to check if the file type is correct (zip, jpg, pdf,...).

I can use the mimeType that comes with the request but I don't trust the user and let's say I want to be sure that nobody is able to upload a .gif file that was renamed in .jpg
I think that in this case I should inspect the magic number.
This is a java library I've found that seems to achieve what I need 'extract the mimetype from the magic number'.
Is this a correct solution or what do you suggest?

UPDATE: I've found the mime-util project and it seems very good and up-to-date! (maybe better then Java Mime Magic Library?)
Here is a list of utility projects that can help you to extract mime-types

sfussenegger · Accepted Answer · 2009-12-16T16:13:22.287

24

Try Java Mime Magic Library

byte[] data = ...
MagicMatch match = Magic.getMagicMatch(data);
String mimeType = match.getMimeType();

edited Dec 16 '09 at 16:13

answered Dec 16 '09 at 15:25

sfussenegger

35,575
15
95
119

1

It does not detect docx files correctly.. it keeps giving application/zip as mimetype... – Oscar Pérez Feb 07 '13 at 09:53
1

@OscarPérez A docx is indeed a zip archive containing a bunch of XML files, so it's technically correct. You could inspect the archive yourself to see if it is a docx or similar. This would probably be out of scope for this small library. – sfussenegger Feb 18 '13 at 14:57
@sfussenegger What can you say about this SO question [check file of MIME-type with JMimeMagic](http://stackoverflow.com/questions/15325047/check-file-of-mime-type-with-jmimemagic)? – catch23 Mar 11 '13 at 15:09

score 10 · Answer 2 · answered Dec 16 '09 at 15:33

I'm sure the library posted by @sfussenegger is the best solution, but I do it by hand with the following snippet that I hope it could help you.

DESCONOCIDO("desconocido", new byte[][] {}), PDF("PDF",
            new byte[][] { { 0x25, 0x50, 0x44, 0x46 } }), JPG("JPG",
            new byte[][] { { (byte) 0xff, (byte) 0xd8, (byte) 0xff,
                    (byte) 0xe0 } }), RAR("RAR", new byte[][] { { 0x52,
            0x61, 0x72, 0x21 } }), GIF("GIF", new byte[][] { { 0x47, 0x49,
            0x46, 0x38 } }), PNG("PNG", new byte[][] { { (byte) 0x89, 0x50,
            0x4e, 0x47 } }), ZIP("ZIP", new byte[][] { { 0x50, 0x4b } }), TIFF(
            "TIFF", new byte[][] { { 0x49, 0x49 }, { 0x4D, 0x4D } }), BMP(
            "BMP", new byte[][] { { 0x42, 0x4d } });

Regards.

PD: The best of it is that it doesn't have any dependency. PD2: No warranty about it's correctness! PD3: "desconocido" stands for "unknown" (in spanish)

score -2 · Answer 3 · answered Dec 16 '09 at 15:31

-2

The activation framework is Sun's answer to this. And you may well have this already in the classpath of your app server

answered Dec 16 '09 at 15:31

James B

3,692
1
25
34

I tried activation framework's getContentType() over some .pdf, .xls files but unfortunately the method is always returning 'application/octet-stream'. only for .txt is giving something like 'text/plain' – mickthompson Dec 16 '09 at 16:35
1

actually the getContentType only maps the file based on the file extension and a map of mimeType that you provide... this is not what I'm looking for – mickthompson Dec 16 '09 at 16:44
I agree, that's not what you're looking for! – James B Dec 17 '09 at 13:44
5

Linking to an IP address is weird. – blong Sep 25 '13 at 18:19

HowTo extract MimeType from a byte[]

3 Answers3

Linked

Related