4

I've noticed that the workbook creation statement for xlsx files with Apache POI v3.10 e.g. `

Workbook wb = WorkbookFactory.create(inputStream) 

or

Workbook wb = new XSSFWorkbook(inputStream)

...is taking a long time (~30 seconds) and the file only has 72 rows with 10 columns (365KB).

It's not a problem, but it just seems a bit excessive. I'm wondering if I'm doing anything wrong or not doing something I should be doing. Instantiation of an xls file with the same data (but only 25KB) only takes 1 or 2 seconds. If this is normal, then could someone just let me know.

Edit:

This is the workbook creation code I'm using:

LOG.info("Loading Excel Workbook...");
Workbook workbook;
try {
    workbook = WorkbookFactory.create(dataStream);
} catch (InvalidFormatException e) {
    throw new IOException("Invalid file format ==> " + e.getMessage());
}
LOG.info("Workbook loaded.");

Just to be clear, dataStream is an InputStream. The 30 second delay occurs between the first and second log statements. As I said previously, I've tried replacing the factory with new XSSFWorkbook(dataStream) but the delay remains.

Edit-2:

I ran a standalone test which does nothing except the workbook initialization using 1) a File, and also 2) an InputStream where the source is the xlsx file I've been having trouble with. They both completed in ~2 seconds.

I should have added some background earlier. I'm using the Google App Engine. The input stream that I'm giving to POI is retrieved from a file upload to the server. App Engine doesn't support Servlet 3.0 (for handling file uploads) so I have to use Apache Commons FileUpload lib to retrieve the file data. Ultimately, the data I get is an InputStream retrieved from FileItemStream#openStream(). This is what I supply to POI.

So, I don't know if this is a problem with the App Engine, or if POI doesn't like the flavor of the InputStream that FileItemStream is returning. Incidentally, I cannot try the initialization with a File instead of a InputStream because App Engine doesn't allow writes to the file system.

RTF
  • 6,214
  • 12
  • 64
  • 132
  • Have you tried using a creating a new workbook in a constructor and calling on it `workbook.write` instead of `WorkbookFactory.create()`? Does it affect your execution time? What POI version are you using, there are a few performance improvements which have been instroduced in version 3.9: http://stackoverflow.com/questions/18957278/apache-poi-performance-issue-with-workbookfactory-create – Norbert Radyk Apr 28 '14 at 07:33
  • I'm using v3.10 and I've discovered that the delay only happens with xlsx files (in particular the 15-20 second init delay I mentioned in the question occurs with a 365KB xslx file). With a 25KB xls file (same data), the delay is only 1-2 seconds. I'm not sure what you mean by `workbook.write`. Could you give a small example please? – RTF Apr 28 '14 at 13:09
  • You can try similar approach to the one shown here: http://stackoverflow.com/questions/23305229/write-items-into-excel-cell/23327356#23327356 just using XSSF classes. – Norbert Radyk Apr 28 '14 at 13:14
  • I tried instantiating the XSSFWorkbook instead of using the factory, but it's the same delay (and it's actually more like 30 seconds). – RTF Apr 28 '14 at 13:26
  • 1
    Hmmm, the delay seems very strange, would you be able please to add the code you use for the XLSX creation? – Norbert Radyk Apr 28 '14 at 13:42
  • 1
    Is there a reason why you're using an InputStream? Only the [docs make it pretty clear](http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream) that a File will be faster and lower memory... – Gagravarr Apr 28 '14 at 13:55
  • Question is updated... – RTF Apr 28 '14 at 14:42
  • If you're having to upload a File, is it possible that the bulk of the wait is actually just transferring the whole excel object to the server so POI can access it? – Gagravarr Apr 28 '14 at 17:07
  • I seriously doubt it, I'm testing on a development server running on localhost. And it's only a 365KB file. Plus, I don't think it would get to the initialization of the workbook until the server had all the data. – RTF Apr 28 '14 at 20:25
  • 1
    See also: http://stackoverflow.com/questions/19341419/java-web-start-causing-slow-execution – Kolban Nov 06 '14 at 15:47

2 Answers2

1

I would do some profiling using one of the available profiling tools, e.g. JVisualVM, Dynatrace, JProfiler, ..

Only then you know for sure where the time is spent in your code, it might be some unexpected place after all and you would be chasing after the wrong horse here.

I.e. you might receive the InputStream from somewhere else and it might be actually a download from some external content via the Internet and the line might be slow and thus all the reading just takes ages. Or it might be something with the disk-setup or memory shortage where lots of GC is running because you are near the limit, ...

One other option would be to extract the smallest possible snippet of code which reproduces this, then you can see what else you need to remove to make it run faster.

centic
  • 15,565
  • 9
  • 68
  • 125
0

Use csv format instead. We had the same issue using POI library and we are now downloading the csv file instead of an excel file, which is very light and is downloaded quickly.