12

How to get file type extension from byte[] (Blob). I'm reading files from DB to byte[] but i don't know how to automatically detect file extension.

Blob blob = rs.getBlob(1);
byte[] bdata = blob.getBytes(1, (int) blob.length());
Lukas Knuth
  • 25,449
  • 15
  • 83
  • 111
senzacionale
  • 20,448
  • 67
  • 204
  • 316

6 Answers6

13

You mean you want to get the extension of the file for which the blob store the content? So if the BLOB stores the content of a jpeg-file, you want "jpg"?

That's generally speaking not possible. You can make a fairly good guess by using some heuristic such as Apache Tikas content detection.

A better solution however, would be to store the mime type (or original file extension) in a separate column, such as a VARCHAR.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • 1
    I'd use more than 3 characters. `.html`, `.java` and `.jpeg` are just 3 quite common file extensions with more than 3 characters. – Joachim Sauer Aug 19 '11 at 10:47
4

It's not perfect, but the Java Mime Magic library may be able to infer the file extension:

Magic.getMagicMatch(bdata).getExtension();
Adam Paynter
  • 46,244
  • 33
  • 149
  • 164
  • Hello Adam! I tried this in Spring app and the system warns me that there is no MagicParseException class (it is one of the exceptions it tells me to handle). What to do? Thanks! – tom Jan 10 '20 at 16:28
2
if(currentImageType ==null){
                ByteArrayInputStream is = new ByteArrayInputStream(image);
                String mimeType = URLConnection.guessContentTypeFromStream(is);
                if(mimeType == null){
                    AutoDetectParser parser = new AutoDetectParser();
                    Detector detector = parser.getDetector();
                    Metadata md = new Metadata();
                    mimeType = detector.detect(is,md).toString();

                    if (mimeType.contains("pdf")){
                        mimeType ="pdf";
                    }
                    else if(mimeType.contains("tif")||mimeType.contains("tiff")){
                        mimeType = "tif";
                    }
                }
                if(mimeType.contains("png")){
                    mimeType ="png";
                }
                else if( mimeType.contains("jpg")||mimeType.contains("jpeg")){
                    mimeType = "jpg";
                }
                else if (mimeType.contains("pdf")){
                    mimeType ="pdf";
                }
                else if(mimeType.contains("tif")||mimeType.contains("tiff")){
                    mimeType = "tif";
                }

                currentImageType = ImageType.fromValue(mimeType);
            }
Akin Okegbile
  • 1,108
  • 19
  • 36
  • `AutoDetectParser` is from [Apache Tika](https://tika.apache.org/2.3.0/api/org/apache/tika/parser/AutoDetectParser.html) by the way. – slindenau Jul 17 '22 at 07:44
2

Try with ByteArrayDataSource (http://download.oracle.com/javaee/5/api/javax/mail/util/ByteArrayDataSource.html) you will find getContentType() method there, which should help but I've never tried it personally.

Kris
  • 5,714
  • 2
  • 27
  • 47
1

An alternative to using a separate column is using Magic Numbers. Here is some pseudo code:

getFileExtn(BLOB)
{
    PNGMagNum[] = {0x89, 0x50, 0x4E, 0x47}
    if(BLOB[0:3] == PNGMagNum)
        return ".png"
    //More checks...
}

You would have to do this for every file type you support. Some obscure file types you might have to find out yourself via a hex editor (the magic number is always the first few bytes of code). The benefit of using the magic number is you get the actual file type, and not what the user just decided to name it.

Quoendithas
  • 106
  • 4
0

There is decent method in JDK's URLConnection class, please refer to following answer: Getting A File's Mime Type In Java

Community
  • 1
  • 1
Yuriy Nakonechnyy
  • 3,742
  • 4
  • 29
  • 41