I am reading a ZIP file using java as below:
Enumeration<? extends ZipEntry> zes=zip.entries();
while(zes.hasMoreElements()) {
ZipEntry ze=zes.nextElement();
// do stuff..
}
I am getting an out of memory error, the zip file size is about 160MB. The stacktrace is as below:
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:88)
at java.util.zip.ZipFile$1.<init>(ZipFile.java:229)
at java.util.zip.ZipFile.getInputStream(ZipFile.java:229)
at java.util.zip.ZipFile.getInputStream(ZipFile.java:197)
at com.aesthete.csmart.batches.batchproc.DatToInsertDBBatch.zipFilePass2(DatToInsertDBBatch.java:250)
at com.aesthete.csmart.batches.batchproc.DatToInsertDBBatch.processCompany(DatToInsertDBBatch.java:206)
at com.aesthete.csmart.batches.batchproc.DatToInsertDBBatch.run(DatToInsertDBBatch.java:114)
at java.util.TimerThread.mainLoop(Timer.java:534)
at java.util.TimerThread.run(Timer.java:484)
How do I enumerate the contents of a big zip file without having increase my heap size? Also when I dont enumerate the contents and just access a single file like this:
ZipFile zip=new ZipFile(zipFile);
ZipEntry ze=zip.getEntry("docxml.xml");
Then I dont get an out of memory error. Why does this happen? How does a Zip file handle zip entries? The other option would be to use a ZIPInputStream. Would that have a small memory footprint. I would need to run this code eventually on a micro EC2 instance on the Amazon cloud (613 MB RAM)
EDIT: providing more information on how I process the zip entries after I get them
Enumeration<? extends ZipEntry> zes=zip.entries();
while(zes.hasMoreElements()) {
ZipEntry ze=zes.nextElement();
S3Object s3Object=new S3Object(bkp.getCompanyFolder()+map.get(ze.getName()).getRelativeLoc());
s3Object.setDataInputStream(zip.getInputStream(ze));
s3Object.setStorageClass(S3Object.STORAGE_CLASS_REDUCED_REDUNDANCY);
s3Object.addMetadata("x-amz-server-side-encryption", "AES256");
s3Object.setContentType(Mimetypes.getInstance().getMimetype(s3Object.getKey()));
s3Object.setContentDisposition("attachment; filename="+FilenameUtils.getName(s3Object.getKey()));
s3objs.add(s3Object);
}
I get the zipinputstream from the zipentry and store that in the S3object. I collect all the S3Objects in a list and then finally upload them to Amazon S3. For those who dont know Amazon S3, its a file storage service. You upload the file via HTTP.
I am thinking maybe since i collect all the individual inputstreams this is happening? Would it help if I batched it up? Like a 100 inputstreams at a time? Or would it be better if I unzipped it first and then used the unzipped file to upload rather storing streams?