2

I am developing a web-app which takes a zip file, uploaded by the user, unzips it on the server, and process the files. It works like a charm when the zip file is not too large (20-25MB) but if the file is about or over (50MB), it produces the OutOfMemoryError.

I have tried to increase the java maximum memory allocation pool by adding export CATALINA_OPTS="-Xmx1024M" to startup.sh in tomcat7, but the error still persists.

AFAIK, the problem is in unzipping the .zip file. top shows that tomcat uses 800MB of memory during the extraction of the 50MB file. Is there any solution, to enable upto ~200MB uploads, whilst efficiently using the available memory?

The code for unzipping is as follows:

package user;

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class unzip {

public void unzipFile(String filePath, String oPath)
{

    FileInputStream fis = null;
    ZipInputStream zipIs = null;
    ZipEntry zEntry = null;
    try {
        fis = new FileInputStream(filePath);
        zipIs = new ZipInputStream(new BufferedInputStream(fis));
        while((zEntry = zipIs.getNextEntry()) != null){
            try{
                byte[] tmp = new byte[8*1024];
                FileOutputStream fos = null;
                String opFilePath = oPath+zEntry.getName();
                System.out.println("Extracting file to "+opFilePath);
                fos = new FileOutputStream(opFilePath);
                int size = 0;
                while((size = zipIs.read(tmp)) != -1){
                    fos.write(tmp, 0 , size);
                }
                fos.flush();
                fos.close();
            }catch(Exception ex){

            }
        }
        zipIs.close();
        fis.close();
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}
}

The error code is as follows:

HTTP Status 500 - javax.servlet.ServletException:      java.lang.OutOfMemoryError: Java heap space

type Exception report

message javax.servlet.ServletException: java.lang.OutOfMemoryError: Java heap space

description The server encountered an internal error that prevented it from fulfilling this request.

exception

org.apache.jasper.JasperException: javax.servlet.ServletException: java.lang.OutOfMemoryError: Java heap space
    org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:549)
    org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:455)
    org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)

root cause

javax.servlet.ServletException: java.lang.OutOfMemoryError: Java heap space
    org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:916)
    org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:845)
    org.apache.jsp.Upload_jsp._jspService(Upload_jsp.java:369)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)

root cause

java.lang.OutOfMemoryError: Java heap space
    org.apache.commons.io.output.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:322)
    org.apache.commons.io.output.DeferredFileOutputStream.getData(DeferredFileOutputStream.java:213)
    org.apache.commons.fileupload.disk.DiskFileItem.getSize(DiskFileItem.java:289)
org.apache.jsp.Upload_jsp._jspService(Upload_jsp.java:159)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)

note The full stack trace of the root cause is available in the Apache Tomcat/7.0.52 (Ubuntu) logs.
Apache Tomcat/7.0.52 (Ubuntu)

Surprisingly, there was nothing on the catalina.out file regarding this exception.

Thanks in advance.

EDIT Code for DiskFileItem in Upload.jsp

//necessary imports go here
File file ;
int maxFileSize = 1000 * 1000 * 1024;
int maxMemSize = 1000 * 1024;
ServletContext context = pageContext.getServletContext();
String filePath = context.getInitParameter("file-upload");
String contentType = request.getContentType();
if(contentType != null)
{
  if ((contentType.indexOf("multipart/form-data") >= 0)) 
  {
  DiskFileItemFactory factory = new DiskFileItemFactory();
  factory.setSizeThreshold(maxMemSize);
  factory.setRepository(new File("/tmp/"));
  ServletFileUpload upload = new ServletFileUpload(factory);
  upload.setSizeMax( maxFileSize );
  try{ 
     List fileItems = upload.parseRequest(request);
     Iterator i = fileItems.iterator();
     while (i.hasNext ()) 
     {

        FileItem fi = (FileItem)i.next();
        if ( !fi.isFormField () )   
        {
           String fieldName = fi.getFieldName();
           String fileName = fi.getName();
           if(fileName.endsWith(".zip")||fileName.endsWith(".pdf")||fileName.endsWith(".doc")||fileName.endsWith(".docx")||fileName.endsWith(".ppt")||fileName.endsWith(".pptx")||fileName.endsWith(".html")||fileName.endsWith(".htm")||fileName.endsWith(".epub")||fileName.endsWith(".djvu"))
           {
              boolean isInMemory = fi.isInMemory();
              long sizeInBytes = fi.getSize();            
              new File(filePath+fileName).mkdir();
              filePath = filePath+fileName+"/";
              file = new File( filePath + fileName.substring( fileName.lastIndexOf("/"))) ;
              fi.write(file);
              String fileExtension = FilenameUtils.getExtension(fileName);
              if(fileExtension.equals("zip"))
              {
                 System.out.println("In zip.");
                 unzip mfe = new unzip();
                 mfe.unzipFile(filePath+fileName,filePath);
                 File zip = new File(filePath+fileName);
                 zip.delete();
              }
              File corePath = new File(filePath);
              int count=0;
           //some more processing
           }
        }
     }
  }
  catch(Exception e)
  {
     //exception handling goes here      
}
  }
}
Samuel Bushi
  • 341
  • 3
  • 16
  • It seems you're using Java 7. Java 8 handles these kind of problems by itself without any extra configurations by user. – The Coder Jun 01 '15 at 08:35
  • 1
    You should really handle the exception in the inner loop. You don't even close the files if something bad happens – Denys Séguret Jun 01 '15 at 08:37
  • You are trying to handle huge file so having such memory error. – Ahmet Karakaya Jun 01 '15 at 08:38
  • @mmc18 Dealing with large files **might** consume a lot of memory; but it doesn't necessarily have to.! It very much depends on **what** and **how** things are done. Long story short - your comment is not helpful; and beyond that: making such a bold statement is simply not based by facts. – GhostCat Jun 01 '15 at 08:44
  • possible duplicate of [Is there any way to handle Java heap space exception](http://stackoverflow.com/questions/30401124/is-there-any-way-to-handle-java-heap-space-exception) – Rajesh Jun 01 '15 at 09:47
  • @blumonkey I've updated my answer – Svetlin Zarev Jun 02 '15 at 17:58

4 Answers4

2

The issue is not in the unzip code you had posted. the root couse is in:

java.lang.OutOfMemoryError: Java heap space
    org.apache.commons.io.output.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:322)
    org.apache.commons.io.output.DeferredFileOutputStream.getData(DeferredFileOutputStream.java:213)
    org.apache.commons.fileupload.disk.DiskFileItem.getSize(DiskFileItem.java:289)

Do you notice the ByteArrayOutputStream.toByteArray ? So it seems that you are writing to a ByteArrayOutputStream which grows too much. Please locate and post the code which uses this ByteArrayOutputStream, as your zip code does not use such thing


Update: From the code you've posted it seems that your code is ok. But the FileItem.getSize() call does some nasty things:

283   public long getSize() {
284        if (size >= 0) {
285            return size;
286        } else if (cachedContent != null) {
287            return cachedContent.length;
288        } else if (dfos.isInMemory()) {
289            return dfos.getData().length;
290        } else {
291            return dfos.getFile().length();
292        }
293    }

If the file item's data is stored in memory - it calls getData() which calls toByteArray()

209    public byte[]  [More ...] getData()
210    {
211        if (memoryOutputStream != null)
212        {
213            return memoryOutputStream.toByteArray();
214        }
215        return null;
216    }

Which in turn allocates a new array:

317    public synchronized byte[] toByteArray() {
318        int remaining = count;
319        if (remaining == 0) {
320            return EMPTY_BYTE_ARRAY; 
321        }
322        byte newbuf[] = new byte[remaining];
           //Do stuff
333        return newbuf;
334    }

So for a short time you have twice the normal memory consumption.

I would recommend you to:

  1. Set the maxMemSize to no-more 8-32 Kb

  2. Give more memory to the JVM process: -Xmx2g for example

  3. Make sure that you are not holding unnecessary any references to FileItems as in your current configuration they consume a lot of memory.

  4. If OOM happens again take a heapdump. You can use the -XX:+HeapDumpOnOutOfMemoryError JVM flag to automatically create a heapdump for you. Then you can use a heap dump analyzer (for instance Eclipse MAT) to check who is allocating so much memory and where it is being allocated.

Svetlin Zarev
  • 14,713
  • 4
  • 53
  • 82
  • Someone had probably just tried to upload the file again and tomcat could not allocate memory for that file due to memory issue likely caused by the zip file extraction (or anything else that occurred on that server) . Wrong pointer. – defectus Jun 01 '15 at 09:06
  • Do you know at least some java ? **Do you know what stacktrace is ?** Are you familiar with how java manages memory ? – Svetlin Zarev Jun 01 '15 at 09:07
  • 1
    Note that this stacktrace points to the apache commons fileupload API, not to code that is home grown. Apparently for some reason file upload wants to tank the entire file into memory just to be able to get the file size. – Gimby Jun 01 '15 at 09:13
  • @Gimby - I guess the rest o fteh stacktrace is visibl ein theserver logs, as the last message says :) – Svetlin Zarev Jun 01 '15 at 10:43
  • The Apache doc. says the following, so decreasing maxMemSize would store the file in disk rather than memory, trading it with speed of data access. Am I right? `public void setSizeThreshold(int sizeThreshold) Sets the size threshold beyond which files are written directly to disk.` I have decreased it, and the error doesn't show up, for now :| . I would try the other suggestions, incase it comes back. – Samuel Bushi Jun 02 '15 at 19:54
1

The issue is when user is uploading a zip file, the entire zip file getting read in memory, from stack trace the error is thrown while making a call to

DiskFileItem.getSize()

From source code of DiskFileItem, DiskFileItem.getSize() is getting all the data first,

public long getSize() {
284        if (size >= 0) {
285            return size;
286        } else if (cachedContent != null) {
287            return cachedContent.length;
288        } else if (dfos.isInMemory()) {
289            return dfos.getData().length;
290        } else {
291            return dfos.getFile().length();
292        }
293    }

By looking at documentation of DeferredFileOutputStream.getDate()

Returns either the output file specified in the constructor or the temporary file created or null.
If the constructor specifying the file is used then it returns that same output file, even when threashold has not been reached.
If constructor specifying a temporary file prefix/suffix is used then the temporary file created once the threashold is reached is returned If the threshold was not reached then null is returned.

Returns:
    The file for this output stream, or null if no such file exists.

Idealy user should not be allowed to upload a file of any size, there should be a max size limit given your server capacity.

TheCodingFrog
  • 3,406
  • 3
  • 21
  • 27
0

Allocating 8MB for each zip entry seems to be just a finger in the air approach. Try to use smaller buffers, say no more than 1kb. Garbage collection doesn't oocur continuously.

Try to use this approach:

int BUFFER_SIZE = 1024;
int size;
byte[] buffer = new byte[BUFFER_SIZE];

...
FileOutputStream out = new FileOutputStream(path, false);
BufferedOutputStream fout = new BufferedOutputStream(out, BUFFER_SIZE);

while ( (size = zin.read(buffer, 0, BUFFER_SIZE)) != -1 ) {
   fout.write(buffer, 0, size);
}
defectus
  • 1,947
  • 2
  • 16
  • 21
  • 1
    Completely wrong. He is allocating 8 **KB** not MB!!! Also when the memory runs out the GC will kick in, so your rremark about the GC is plain stupid. Reducing the buffer will not help, but it will degrade performance. The optimal byffer size is between 4-8k. Also most (if not all) JDK system classes use a buffer of 8k – Svetlin Zarev Jun 01 '15 at 08:48
  • My bad. Still allocating 8kB on every loop looks inefficient. And as GC is run in it's own thread it could happen that by the time GC finishes the allocating thread's already used that freed memory. – defectus Jun 01 '15 at 09:14
  • 1
    Well, GC *stops the world*, so this cannot happen – Svetlin Zarev Jun 01 '15 at 10:45
0

It seems like your while loop is making too much memory creation.

Check the number of times that it occurs to decide.

Main this line below is the cause:

byte[] tmp = new byte[8*1024];

You can try to reduce 1024 to something like 10 and see if it's still happeneds.
Also check the file size.

a_z
  • 342
  • 1
  • 3
  • 14
  • Creating a new buffer on each iteration is indeed stupid, but it cannot cause OOM error, as the GC would be able to collect the previous buffer. Reducing teh buffer size will not help - it will only degrade performance – Svetlin Zarev Jun 01 '15 at 08:45
  • it still need to be checked as I see it from here – a_z Jun 01 '15 at 08:46