0

I've been reading the previous answers to this question and trying the code. None have helped so far. All of them result in a GC Out of Memory error.

I am running Eclipse 2019-09 on a Windows 10 PC with 12 GB RAM. In Eclipse I deployed a WildFly 12 internal server. I have JDK 1.8.0 152. In the WildFly startup line, I changed the option from -Xmx512m to -Xmx2048m.

I tried the simple POI. Also added monitorjbl. I then removed everything from the code but still can't read large files. My XLSX has 12 sheets, each with 500K rows. Each row is of the same format: 1 serial number, 2 (string) id's with numbers and dashes, and a (string) name column.

The program runs for about 12 minutes before giving the memory error.

import com.monitorjbl.xlsx.StreamingReader;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintWriter;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;

/**
 * Servlet implementation class StreamImport
 */
@WebServlet(description = "Import data using XLSX stream", urlPatterns = { "/StreamImport" })
public class StreamImport extends HttpServlet {
    private static final long serialVersionUID = 1L;

    /**
     * @see HttpServlet#HttpServlet()
     */
    public StreamImport() {
        super();
        // TODO Auto-generated constructor stub
    }

    /**
     * @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
     */
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        final String strFile = "C:\\Users\\Hussain\\Downloads\\2019819515128395ATL_IT1.xlsx";
        Integer iRows = 0;

        PrintWriter out = response.getWriter();
        out.println("<html><body>");
        out.println("<h1>Importing Using Stream</h1>");
        out.println("<p>Input Excel File: " + strFile + "</p>");
        // Open file
        FileInputStream fisFile = new FileInputStream(new File(strFile));

        //Create Workbook instance holding reference to .xlsx file
         Workbook workbook = StreamingReader.builder()
                    .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
                    .bufferSize(12000)     // buffer size to use when reading InputStream to file (defaults to 1024)
                    .open(fisFile);            // InputStream or File for XLSX file (required)

        //Get first/desired sheet from the workbook
         Sheet sheet = workbook.getSheetAt(0);

         for(Row row : sheet) {
             iRows++;
        }
        workbook.close();
        fisFile.close();
        out.println("Read " + iRows.toString() + " rows <br>");
        out.println("</body></html>");
    }
   }

Some of the answers say that the monitorjbl is no longer needed as the SXSSF method does its own streaming. Others say that SXSSF is only for writing.

Edit: Yes, as I said, I know this is a duplicate question. First, I don't know if the monitorjbl solution has been superseded by SXSSF. And, secondly, after following other solutions, I am still getting out of memory errors.

Hussain Akbar
  • 646
  • 8
  • 25
  • `-Xmx2048m` - why not increase it more? – Scary Wombat Sep 25 '19 at 05:26
  • I am currently working on an application which uses `-Xmx12g` to work read files up to 15MB. `-Xmx2048m` is nothing for apache-poi – XtremeBaumer Sep 25 '19 at 06:16
  • The final version is going to run on a production server which has, well, production stuff. If my XLSX process takes up gigs of RAM, all other processes are going to suffer. That's why I need to run it in as little RAM as possible. – Hussain Akbar Sep 25 '19 at 07:00
  • @ScaryWombat Can you kindly post links to a few? All the ones I tried didn't solve my problem. – Hussain Akbar Sep 25 '19 at 09:54
  • *Loading should be done using XSSF without inputstream'ing the file (to avoid a full load of it in memory)* as per the link - have you tried? – Scary Wombat Sep 25 '19 at 23:59
  • @ScaryWombat The examples use InputStream to pass to XSSF. Kindly point me to the correct one. – Hussain Akbar Sep 27 '19 at 07:22
  • I am now trying reading using SAX method as per https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java. Just iterating / counting rows is taking 1 second for 10K rows. With my actual file containing 12 sheets and 2.5M rows, it is taking 34 secs. Yay! Target achieved. No need to increase the RAM. – Hussain Akbar Sep 27 '19 at 08:48
  • @HussainAkbar can you please share your solution in details... I'm also facing same kind of problem. – Abdul Majid Bajwa Jan 26 '21 at 16:48
  • @AbdulMajidBajwa I think this is the version that went into production: https://docs.google.com/document/d/11TgqUoHzmsAfcB2aUY1QFZsWpxIKgREH1Z2gT_oWmbE/edit?usp=sharing – Hussain Akbar Jan 28 '21 at 05:51

0 Answers0