16

I am looking to write to an excel (.xls MS Excel 2003 format) file programatically using Java. The excel output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit).

I have tried using the apache POI APIs but it seems to be a memory hog due to the API object model. I am forced to add cells/sheets to the workbook object in memory and only once all data is added, I can write the workbook to a file! Here is a sample of how the apache recommends i write excel files using their API:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

//Create a row and put some cells in it
Row row = sheet.createRow((short)0);

// Create a cell and put a value in it.
Cell cell = row.createCell(0);
cell.setCellValue(1);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space".

I have tried increasing JVM initial heapsize and max heap size using Xms and Xmx parameters as Xms512m and Xmx1024. Still cant write more than 150k rows to the file.

I am looking for a way to stream to an excel file instead of building the entire file in memory before writing it to disk which will hopefully save a lot of memory usage. Any alternative API or solutions would be appreciated, but I am restricted to usage of java. Thanks! :)

Jaskirat
  • 1,114
  • 1
  • 8
  • 17
  • You can have a look at: http://stackoverflow.com/questions/6004379/java-write-excel-files-with-poi-event-model – ParagJ Jun 01 '12 at 09:14
  • Only 1024m? Try 4086 (4 gig). We run 8 gig vms at work at times). Spreadsheets are designed that way, can even excel work on just a part of a spreadsheet at once? – Bill K Aug 09 '13 at 02:06

9 Answers9

10

Try to use SXSSF workbook, thats great thing for huge xls documents, its build document and don't eat RAM at all, becase using nio

Serhii Bohutskyi
  • 2,261
  • 2
  • 28
  • 28
  • SXSSF dont support working with XLS https://stackoverflow.com/questions/20678164/saving-sxssf-as-xls-file – Samy Omar Oct 17 '17 at 14:26
7

All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want.

Wikipedia has a good article about the overall format.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Thanks! I did think of xslx, but the clients use office 2003 and xslx becomes problematic. Also they cant install that MS plugin to convert xslx to xls. Two words - "Enterprise IT" :| – Jaskirat Sep 28 '09 at 09:46
  • Create the file, open it in Excel and save it with the old format. – Aaron Digulla Sep 28 '09 at 09:53
  • Reports would be generated on a daily/weekly basis, opening in excel and saving with old format is not really feasible. – Jaskirat Sep 28 '09 at 10:38
  • Ask an Excel buff to write you a macro for this. Put the macro into an Excel sheet and just double click the "convert sheet" to do the conversion. When that works, you can write a batch file to do it automatically. – Aaron Digulla Sep 28 '09 at 11:46
  • 2
    @Jass: You use Office 2003 but need to write 150k Rows? I thought only Office Versions from 2007 on are capable of using more than 65536 rows... http://en.wikipedia.org/wiki/Excel_2003 – HerdplattenToni Dec 21 '09 at 13:00
  • 1
    @Toni I has stated that "output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit)." – Jaskirat Dec 22 '09 at 17:54
4

I had to split my files into several excel files in order to overcome the heap space exception. I figured that around 5k rows with 22 columns was about it, so I just made my logic so that every 5k row I would end the file, start a new one and just numerate the files accordingly.

In the cases where I had 20k + rows to be written I would have 4+ different files representing the data.

Chris Dale
  • 2,222
  • 2
  • 26
  • 39
  • yeah i had that idea too, multiple excel files and then maybe zip it into one archive. But it is not good enough.Anyways thanks for the suggestion. – Jaskirat Dec 21 '09 at 16:15
  • If you ever come up with a solution I appriciate if you update your question =) This is a very annoying problem. – Chris Dale Dec 22 '09 at 09:08
  • Am facing the same problem now. Am planning to split the data. But Can i know how you are doing it??? – Akshar A K May 10 '13 at 10:45
3

Have a look at the HSSF serializer from the cocoon project.

The HSSF serializer catches SAX events and creates a spreadsheet in the XLS format used by Microsoft Excel

sal
  • 23,373
  • 15
  • 66
  • 85
pgras
  • 12,614
  • 4
  • 38
  • 46
2

There also is JExcelApi, but its uses more memory. i think you should create .csv file and open it in excel. it allows you to pass a lot of data, but you wont be able to do any "excel magic".

IAdapter
  • 62,595
  • 73
  • 179
  • 242
  • Delimited files are of course awesomely light weight! But unfortunately this is not an option.No excel magic and no data formatting etc. I need to write .xls files. :( – Jaskirat Sep 28 '09 at 09:42
  • 1
    I think you should try to convince your boss to use .csv, because later you will have a lot of problem with slow working excel and the whole webapp could not work, because 10 people are generating excel reports. – IAdapter Sep 28 '09 at 10:25
2

Consider using CSV format. This way you aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSET and immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.

That said, if the data is actually coming from a DB, then I would highly reconsider if Java is the right tool for this. Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILE command for this.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
1

We developed a java library for this purpose and currently it is available as open source project https://github.com/jbaliuka/x4j-analytic . We use it for operational reporting. We generate huge Excel files, ~200,000 should work without problems, Excel manages to open such files too. Our code uses POI to load template but generated content is streamed directly to file without XML or Object model layer in memory.

jbaliuka
  • 249
  • 1
  • 10
0

Is this memory issue happen when you insert data into cell, or when you perform data computation/generation?

If you are going to load files into an excel that consist of predefined static template format, then better to save a template and reuse multiple time. Normally template cases happen when you are going to generate daily sales report or etc...

Else, every time you need to create new row, border, column etc from scratch.

So far, Apache POI is the only choice I found.

"Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space"."

"Enterprise IT"

What YOU CAN DO is- perform batch data insertion. Create a queuetask table, everytime after generate 1 page, rest for seconds, then continue second portion. If you are worry about the dynamic data changes during your queue task, you can first get the primary key into the excel (by hiding and lock the column from user view). First run will be insert primary key, then second queue run onwards will read out from notepad and do the task portion by portion.

i need help
  • 2,386
  • 10
  • 54
  • 72
  • Why are we talking about task queues? :-S I dont really understand what you are trying to say. When you say use templates, do you want me to use the jxls api or something of that sort? – Jaskirat Sep 28 '09 at 10:40
0

We did something quite similar, same amount of data, and we had to switch to JExcelapi because POI is so heavy on resources. Try JexcelApi, you won't regret it when you have to manipulate big Excel-files!

fvu
  • 32,488
  • 6
  • 61
  • 79
  • Thanks, I ll do a Proof-Of-Concept with the JExcelAPI and see how it fares against POI. But the structure seems similar, so i dont really see how much of a difference it will make. Can you give me some ratios, numbers if possible? – Jaskirat Sep 28 '09 at 10:42
  • As we ditched the POI based code quite some time ago for exactly the same problems as yours, sorry I can't. However, I just had a look at the Glassfish server that among other apps hosts the application that generates the .xls files. It has -Xmx768m, and we never had Excel generations that caused an oom error. I just generated 10 xls's of 45000 lines * 8 cols and there's 574,423,040 bytes on the heap. Hope this helps – fvu Sep 28 '09 at 11:55
  • Multiple xls wouldnt be a problem, cos once they are flushed to file, the object could be sent for garbage collection. So yeah one excel file of 45k rows x 8 cols would run under 768Mb but 150,000 x 20 Cols would require more than 1GB and I cant exponentially keep allocating memory, that would spell bad design. :D Anyway Thanks for all that! :) – Jaskirat Sep 28 '09 at 12:15