0

I am using jdbcTemplate to query to an ETL tool and fetch the results and write to an Excel Worksheet using Apache POI library. I have somewhere between 700,000 and 1,000,000 rows to fetch and write.

Earlier I had the following setup:

jdbcTemplate.query(new CustomPreparedStatementCreator(arg), new ResultSetExtractor<Void>() {
    @Override
    public Void extractData(ResultSet rs) {
    while (rs.next()) {... process sequentially ...}
    // caused the connection to remain open for 650 seconds
    // setFetchSize was not set so it was default
}

Now I have the following setup:

jdbcTemplate.setFetchSize(200);
SqlRowSet srs = jdbcTemplate.queryForRowSet(sql);
// all the rows are fetched in 60 seconds into memory now
while (srs.next()) {... process sequentially ...} // still takes 600 seconds to write to the worksheet

Now my problems are:

  1. How to I leverage the Java 8 parallelStream API to process all the rows inside a SqlRowSet object? Is there any alternative?
  2. How can I write faster to the worksheet (an org.apache.poi.xssf.streaming.SXSSFSheet) ?
sprkv5
  • 157
  • 1
  • 8
  • 1
    AFAIK, an `SXSSFSheet` is not thread safe so attempting to write to an `SXSSFSheet` in multiple threads will not work. To minimize the connection time, consider using a queue to put the domain objects and another thread to consume the queue which writes to the `SXSSFSheet`. But does it have to be Excel? Can it be CSV (which might be faster to create than Excel)? – Andrew S Sep 21 '18 at 17:57
  • I also have a CSV implmentation (using org.apache.commons.csv.CSVPrinter) and it's insanely fast. Writes the same amount of data in 30 seconds sequentially. But some "brilliant" users want Excel. – sprkv5 Sep 21 '18 at 19:29
  • The `apache poi` creates a `Workbook` which is then able providing all possibilities which `Excel` also provides (textual data but also pictures, shapes, charts, pivot tables, ...). This of course must be more costly than `CSV` which only provides textual data. So if only a `*.xlsx` file is needed filled with textual data, then using `apache poi` might be unnecessarily costly. In https://stackoverflow.com/questions/50738119/how-to-create-and-write-to-excel-file-xlsx/50886486#50886486 I have shown the simplest possible method for creating a simple `*.xlsx` file. – Axel Richter Sep 22 '18 at 06:23
  • The idea of *.xlsx file being a zip archive of xml text files helped me a lot. I was able to implement a multithreaded writer (months ago) which works fine and which gave me a speed-up of 2x for really big xlsx files containing a million rows and 250+ columns. Thanks a lot @axel-richter – sprkv5 Nov 20 '20 at 17:16

0 Answers0