0

I would like to read excel file of 400k rows max using java or spring efficiently.Right now we are using Apache POI to read files and process it took more than 15 minutes . Iam running out of ideas , Can anyone please help me process this huge file efficiently using Java related tech stack ?

EDIT : IS there a way to sort rows based on a particular Integer column , with a minimum memory usage using Apache POI.

adorearun
  • 151
  • 1
  • 13
  • 3
    post your code. – secondbreakfast Mar 30 '17 at 18:52
  • 3
    Try streaming: http://stackoverflow.com/questions/33786219/apache-poi-streaming-sxssf-for-reading – Roman Puchkovskiy Mar 30 '17 at 18:53
  • Just as an idea: Try to profile the poi code. Maybe you create too much objects and you have issues with memory allocation or something? Or maybe your loop is inefficient... When asking a question like "look my code is slow" - you should know where does it spend its time exactly... – Mark Bramnik Mar 30 '17 at 18:57
  • According to this question http://stackoverflow.com/questions/5992536/apache-poi-java-excel-performance-for-large-spreadsheets?rq=1 - Apache POI works efficiently at least with reading a file. So, the problem mostly lies out of POI code. – Ivan Pronin Mar 30 '17 at 18:58
  • 2
    try Excel Streaming Reader -> https://github.com/monitorjbl/excel-streaming-reader – Sundararaj Govindasamy Mar 30 '17 at 19:00
  • Thank you Very much , i am able to process 500k records under 2 mins using the above library. – adorearun Mar 30 '17 at 21:29
  • IS there a way to sort rows based on a particular Integer column , with a minimum memory usage for 500k records using Apache POI. – adorearun Apr 03 '17 at 14:03

1 Answers1

0

You can use OPCPackage and XSSFReader class. It has less memory footprint. For sorting you can use HashMap with your own custom Comparable. You can see the example source here link

Mohit Gaur
  • 355
  • 1
  • 3
  • 22