24

Is there a way to determine MS Office Excel file type in Apache POI? I need to know in what format is the Excel file: in Excel '97(-2007) (.xls) or Excel 2007 OOXML (.xlsx).

I suppose I could do something like this:

int type = PoiTypeHelper.getType(file);
switch (type) {
case PoiType.EXCEL_1997_2007:
   ...
   break;
case PoiType.EXCEL_2007:
   ...
   break;
default:
   ...
}

Thanks.

Alexey Berezkin
  • 1,513
  • 1
  • 10
  • 18

5 Answers5

49

Promoting a comment to an answer...

If you're going to be doing something special with the files, then rjokelai's answer is the way to do it.

However, if you're just going to be using the HSSF / XSSF / Common SS usermodel, then it's much simpler to have POI do it for you, and use WorkbookFactory to have the type detected and opened for you. You'd do something like:

 Workbook wb = WorkbookFactory.create(new File("something.xls"));

or

 Workbook wb = WorkbookFactory.create(request.getInputStream());

Then if you needed to do something special, test if it's a HSSFWorkbook or XSSFWorkbook. When opening the file, use a File rather than an InputStream if possible to speed things up and save memory.

If you don't know what your file is at all, use Apache Tika to do the detection - it can detect a huge number of different file formats for you.

Community
  • 1
  • 1
Gagravarr
  • 47,320
  • 10
  • 111
  • 156
25

You can use:

// For .xlsx
POIXMLDocument.hasOOXMLHeader(new BufferedInputStream( new FileInputStream(file) ));

// For .xls
POIFSFileSystem.hasPOIFSHeader(new BufferedInputStream( new FileInputStream(file) ));

These are essentially the methods that the WorkbookFactory#create(InputStream) uses for determining the type

Please note, that both method supports only streams supporting "mark" feature (or PushBackInputStream), so simple FileInputStream is not supported. Use BufferedInputStream as a wrapper. For this reason after the detection you can simply reuse the stream, since it will be reseted to the starting point.

Gábor Lipták
  • 9,646
  • 2
  • 59
  • 113
RJo
  • 15,631
  • 5
  • 32
  • 63
  • 1
    Thank you. This is the only way to detect file type when you are using low level Event based read only access to process large excel files. – SWilk Aug 09 '13 at 15:57
  • 1
    Added the information that only "mark" supporting streams can be used. Updated code sample. – Gábor Lipták Jul 16 '14 at 11:26
  • method `hasOOXMLHeader` is depricated within class `POIXMLDocument`, use it within class [DocumentFactoryHelper](https://poi.apache.org/apidocs/org/apache/poi/poifs/filesystem/DocumentFactoryHelper.html) – KAD Jul 11 '16 at 09:48
  • 1
    Deprecated methods – Saikat Apr 29 '18 at 15:06
3

This can be done using the FileMagic class. See below JavaDoc - https://poi.apache.org/apidocs/org/apache/poi/poifs/filesystem/FileMagic.html

Sample code snippet:

FileMagic.valueOf(inputStream).equals(FileMagic.OOXML) // XLSX

Saikat
  • 14,222
  • 20
  • 104
  • 125
1

Based on the lib implementation of org.apache.poi.ss.usermodel.WorkbookFactory#create(java.io.InputStream)

We can mimic the WorkbookFactory's logic, remove irrelevant bits and return file type instead.

public static TYPE fileType(File file) {
    try (
            InputStream inp = new FileInputStream(file)
    ) {
        if (!(inp).markSupported()) {
            return getNotMarkSupportFileType(file);
        }
        return getType(inp);
    } catch (IOException e) {
        LOGGER.error("Analyse FileType Problem.", e);
        return TYPE.INVALID;
    }
}

private static TYPE getNotMarkSupportFileType(File file) throws IOException {
    try (
            InputStream inp = new PushbackInputStream(new FileInputStream(file), 8)
    ) {
        return getType(inp);
    }
}

private static TYPE getType(InputStream inp) throws IOException {
    byte[] header8 = IOUtils.peekFirst8Bytes(inp);
    if (NPOIFSFileSystem.hasPOIFSHeader(header8)) {
        NPOIFSFileSystem fs = new NPOIFSFileSystem(inp);
        return fileType(fs);
    } else if (DocumentFactoryHelper.hasOOXMLHeader(inp)) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.INVALID;
}

private static TYPE fileType(NPOIFSFileSystem fs) {
    DirectoryNode root = fs.getRoot();
    if (root.hasEntry("EncryptedPackage")) {
        return TYPE.XSSF_WORKBOOK;
    }
    return TYPE.HSSF_WORKBOOK;

}

public enum TYPE {
    HSSF_WORKBOOK, XSSF_WORKBOOK, INVALID
}
Vladimir Vagaytsev
  • 2,871
  • 9
  • 33
  • 36
KoneCth
  • 37
  • 5
0

This is the way i have identified the requested file is of Office type.

public static boolean isOfficeDoc(String filePath) {
         FileMagic fileMagic = getFileMagicObj(filePath);
            return fileMagic != null && (fileMagic == FileMagic.OLE2 || fileMagic == FileMagic.OOXML);
        }

    private static FileMagic getFileMagicObj(String filePath) {

        try (InputStream is = new FileInputStream(filePath);
             InputStream magicIS = FileMagic.prepareToCheckMagic(is)) {

            return FileMagic.valueOf(magicIS);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            return null;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
Amol Suryawanshi
  • 2,108
  • 21
  • 29