1

I need to write 600-700k records into xlsx file using Apache POI. the code I am presently using is :

public void writeRecords(ResultSet rs)  {
             try{

            SXSSFWorkbook wb = new SXSSFWorkbook();  
            wb.setCompressTempFiles(true);

            SXSSFSheet sh = (SXSSFSheet)wb.createSheet("Sheet 1");
            Row row = null;


            int numColumns = rs.getMetaData().getColumnCount();

               //   Workbook wb = ExcelFileUtil.createExcelWorkBook(true, 5);


                sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
               Row heading = sh.createRow(1);

                ResultSetMetaData rsmd = rs.getMetaData();

                for(int x = 0; x < numColumns; x++) {
                    Cell cell = heading.createCell(x+1);
                    cell.setCellValue(rsmd.getColumnLabel(x+1));
                }

                int rowNumber = 2;
                int sheetNumber = 0;

                while(rs.next()) {

                    row = sh.createRow(rowNumber);
                    for(int y = 0; y < numColumns; y++) {
                        row.createCell(y+1).setCellValue(rs.getString(y+1));
                      //  wb.write(bos);
                    }

                    rowNumber++;
                }

                FileOutputStream out = new FileOutputStream("C:/Users/test1.xlsx");
                wb.write(out);
                out.close();
             }
                catch (Exception e){
                    e.printStackTrace();

                }

It is working fine but it is taking ~50 minutes to write ~65k records. Resultset of 65k records was fetched in 5-6 minutes.

Is there any way we can write 600,000-700,000 records in about 10-15 minutes using POI. We wont be able to export data into CSV format, as the endusers have setups to import xlsx files only. regards, Tushar

Tushar
  • 1,450
  • 6
  • 18
  • 30
  • 1
    How do you know the ResultSet was fetched in 7 seconds? In the code above you are still fetching while writing the rows. – Thilo Apr 23 '15 at 10:13
  • Also, how does Excel behave when you throw a spreadsheet with 700k rows at it? How big is that file? – Thilo Apr 23 '15 at 10:15
  • I logged timestamp before and after : ps = con.prepareStatement("select * from table_1 where rownum<65000"); rs = ps.executeQuery(); – Tushar Apr 23 '15 at 10:15
  • `executeQuery` does not include fetching the data. That happens while you call `next()` – Thilo Apr 23 '15 at 10:17
  • ohhk... on pl/SQL developer data is displayed in about 5-6 minutes – Tushar Apr 23 '15 at 10:19
  • @Thilo - is it possible using some other api (freeware) – Tushar Apr 23 '15 at 10:29
  • 1
    The default window size of `SXSSFWorkbook` is 100. You should match this value with the number of rows fetched by `rs.next()` as Cristobal mentionned in his answer to improve the overall time since the GC might be able to clean up the rows while the data is fetched. – Jonathan Drapeau Apr 23 '15 at 14:38
  • Have you considered writing to a simple, plain CSV? Excel can open them too. – rpax Apr 26 '15 at 22:23

2 Answers2

3

Check the fetchSize of the PreparedStatement. If it isn't explicitly set, the value may be very small compared with the reality of the table, and the speed of queries on medium-large amounts of data sees very affected.

Check this question for more information.

Also, consider if it's necessary to use setCompressTempFiles, or SXSSFWorkbook at all. If is needed, the value of rows keeps in memory will impact performance, in a directly proportional way.

Community
  • 1
  • 1
0

it would be very fast if your able write file output form sqlplus .

create file as below mycsv.sql:

   SET DEFINE OFF
   SET ECHO OFF
   SET SERVEROUTPUT OFF
   SET TERMOUT OFF
   SET VERIFY OFF
   SET FEEDBACK OFF
   SET PAGESIZE 10000
   SET ARRAYSIZE 5000
   REM SET HEAD OFF
   SET LINE 500
   spool /tmp/mycsvfile.csv;
   select * from MY_table;
   spool off;
   exit;

and from Linux prompt you can run like

$> sqlplus username/password @/tmp/mycsv.sql
rpax
  • 4,468
  • 7
  • 33
  • 57
Ramki
  • 453
  • 2
  • 7
  • Does not answer the question at all. – rpax Apr 26 '15 at 22:25
  • Hi what i meant is , you r final goal is to write output in a file then use sqlplus. Doing it vai sqlplus will be faster – Ramki Apr 27 '15 at 05:33
  • 1
    Hi . what i meant is , if your final goal is to write csv output in a file then use sqlplus. 1. Doing it in sqlplus will be faster. 2. if data is very high in Apache you might got out of memory error. 3. Fetching data from DB to Apache & writing in a file , i am saying remove intermediate Apache layer this will good advantage . i hope this is clear. – Ramki Apr 27 '15 at 05:39