32

Simple question: how can I get MIME type (or content type) of an InputStream, without saving file, for a file that a user is uploading to my servlet?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Trick
  • 3,779
  • 12
  • 49
  • 76
  • 2
    Where is the InputStream coming from? If it's just a generic input stream with some series of bytes, they're "untyped" and you won't know without reading the content itself and determining. But if you're getting the bytes from a (say) HTTP connection, there's sideband headers that can tell you what you want. – Ben Zotto Jan 05 '11 at 08:29
  • It is coming from user uploading file(s). – Trick Jan 05 '11 at 08:49
  • You could try the `MimeUtils` library. – herrtim Apr 21 '16 at 13:28

8 Answers8

12

I wrote my own content-type detector for a byte[] because the libraries above weren't suitable or I didn't have access to them. Hopefully this helps someone out.

// retrieve file as byte[]
byte[] b = odHit.retrieve( "" );

// copy top 32 bytes and pass to the guessMimeType(byte[]) funciton
byte[] topOfStream = new byte[32];
System.arraycopy(b, 0, topOfStream, 0, topOfStream.length);
String mimeGuess = guessMimeType(topOfStream);

...

private static String guessMimeType(byte[] topOfStream) {

    String mimeType = null;
    Properties magicmimes = new Properties();
    FileInputStream in = null;

    // Read in the magicmimes.properties file (e.g. of file listed below)
    try {
        in = new FileInputStream( "magicmimes.properties" );
        magicmimes.load(in);
        in.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    // loop over each file signature, if a match is found, return mime type
    for ( Enumeration keys = magicmimes.keys(); keys.hasMoreElements(); ) {
        String key = (String) keys.nextElement();
        byte[] sample = new byte[key.length()];
        System.arraycopy(topOfStream, 0, sample, 0, sample.length);
        if( key.equals( new String(sample) )){
            mimeType = magicmimes.getProperty(key);
            System.out.println("Mime Found! "+ mimeType);
            break;
        } else {
            System.out.println("trying "+key+" == "+new String(sample));
        }
    }

    return mimeType;
}

magicmimes.properties file example (not sure these signatures are correct, but they worked for my uses)

# SignatureKey                  content/type
\u0000\u201E\u00f1\u00d9        text/plain
\u0025\u0050\u0044\u0046        application/pdf
%PDF                            application/pdf
\u0042\u004d                    image/bmp
GIF8                            image/gif
\u0047\u0049\u0046\u0038        image/gif
\u0049\u0049\u004D\u004D        image/tiff
\u0089\u0050\u004e\u0047        image/png
\u00ff\u00d8\u00ff\u00e0        image/jpg
Kit
  • 3,388
  • 1
  • 27
  • 24
  • 5
    Note, that this won't work for PNG, for example, whose first byte is 137. Considering byte is signed in Java (can't hold values larger than 128), it gets converted to -119. What I did is I read InputStream to int[4] array, using InputStream#read() method, which returns bytes as ints, so they don't get converted. Thanks for your answer, anyway! – jFrenetic Aug 10 '15 at 16:01
8

According to Real Gagnon's excellent site, the better solution for your case would be to use Apache Tika.

Riduidel
  • 22,052
  • 14
  • 85
  • 185
7

It depends on where you are getting the input stream from. If you are getting it from a servlet then it is accessable through the HttpServerRequest object that is an argument of doPost. If you are using some sort of rest API like Jersey then the request can be injected by using @Context. If you are uploading the file through a socket it will be your responsibility to specify the MIME type as part of your protocol as you will not inherit the http headers.

LINEMAN78
  • 2,562
  • 16
  • 19
  • One of the examples with actual code - https://stackoverflow.com/questions/10600013/http-415-on-file-upload-using-jersey/12183755#comment85486792_12183755 – Saurabh Gupta Mar 12 '18 at 18:12
2

I'm a big proponent of "do it yourself first, then look for a library solution". Luckily, this case is just that.

You have to know the file's "magic number", i.e. its signature. Let me give an example for detecting whether the InputStream represents PNG file.

PNG signature is composed by appending together the following in HEX:

1) error-checking byte

2) string "PNG" as in ASCII:

     P - 0x50
     N - 0x4E
     G - 0x47

3) CR (carriage return) - 0x0D

4) LF (line feed) - 0xA

5) SUB (substitute) - 0x1A

6) LF (line feed) - 0xA

So, the magic number is

89   50 4E 47 0D 0A 1A 0A

137  80 78 71 13 10 26 10 (decimal)
-119 80 78 71 13 10 26 10 (in Java)

Explanation of 137 -> -119 conversion

N bit number can be used to represent 2^N different values. For a byte (8 bits) that is 2^8=256, or 0..255 range. Java considers byte primitives to be signed, so that range is -128..127. Thus, 137 is considered to be singed and represent -119 = 137 - 256.

Example in Koltin

private fun InputStream.isPng(): Boolean {
    val magicNumbers = intArrayOf(-119, 80, 78, 71, 13, 10, 26, 10)
    val signatureBytes = ByteArray(magicNumbers.size)
    read(signatureBytes, 0, signatureBytes.size)
    return signatureBytes.map { it.toInt() }.toIntArray().contentEquals(magicNumbers)
}

Of course, in order to support many MIME types, you have to scale this solution somehow, and if you are not happy with the result, consider some library.

Sevastyan Savanyuk
  • 5,797
  • 4
  • 22
  • 34
1

You can just add the tika-app-1.x.jar to your classpath as long as you don't use slf4j logging anywhere else because it will cause a collision. If you use tika to detect an inputstream it has to be mark supported. Otherwise, calling tika will erase your input stream. However if you use the apache IO library to get around this and just turn the InputStream into a File in memory.

import org.apache.tika.*;

Tike tika = new Tika();
InputStream in = null;
FileOutputStream out = null;
try{
   out = new FileOutputStream(c:/tmp.tmp);
   IOUtils.copy(in, out);
   String mimeType = tika.detect(out);
}catch(Exception e){
   System.err.println(e);
} finally {
   if(null != in) 
       in.close();
   if(null != out)
       out.close();
 }
kslote1
  • 720
  • 6
  • 15
1

You can check the Content-Type header field and have a look at the extension of the filename used. For everything else, you have to run more complex routines, like checking by Tikaetc.

b_erb
  • 20,932
  • 8
  • 55
  • 64
0

If using a JAX-RS rest service you can get it from the MultipartBody.

@POST
@Path( "/<service_path>" )
@Consumes( "multipart/form-data" )
public Response importShapeFile( final MultipartBody body ) {
    String filename = null;
    String InputStream stream = null;
    for ( Attachment attachment : body.getAllAttachments() )
    {
        ContentDisposition disposition = attachment.getContentDisposition();
        if ( disposition != null && PARAM_NAME.equals( disposition.getParameter( "name" ) ) )
        {
            filename = disposition.getParameter( "filename" );
            stream = attachment.getDataHandler().getInputStream();
            break;
        }
    }

    // Read extension from filename to get the file's type and
    // read the stream accordingly.
}

Where PARAM_NAME is a string representing the name of the parameter holding the file stream.

crowmagnumb
  • 6,621
  • 9
  • 33
  • 42
-4

I think this solves problem:

    public String readIt(InputStream is) {
    if (is != null) {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"), 8);

            StringBuilder sb = new StringBuilder();
            String line;
            while ((line = reader.readLine()) != null) {
                sb.append(line).append("\n");
            }
            is.close();
            return sb.toString();
    }
    return "error: ";
}        

What it returns? For example for png : "♦PNG\n\n♦♦♦.....", for xml:

Quite usefull, You cant try string.contains() to check what is it

Adrian G
  • 9
  • 1
  • 4