I am attempting to load 69,930 files into a basic text editor. This goes smoothly and after they are all loaded the memory sits at a very cool 130MB. However, during the peak loading time this can hit a maximum of 900MB - 1200MB.
The memory is all referencing the Inflater#buf
field. This is used only to load the file into the object model, then it is never used again and the bytes can be cleared.
Obviously, the extra memory is all cleared by the garbage collector soon after loading - so no memory leaks. However, it just seems unnecessary to use so much extra memory.
What I have tried:
- The memory issue is 'resolved' by making a
System.gc()
call immediately after closing theZipFile
. This results in ~75% monitor time on the threads, high CPU usage and slow load times. - Reducing thread-pool-count. This reduced the impact (to 300MB) yet resulted in significantly longer load times.
- WeakReference
What I have so far:
I call the load through a 4-thread-count thread pool, each one performing the relatively simple task:
// Source source = ...;
final InputStream input = source.open();
// read into object model
input.close();
The Source
in this case is a ZipFileSource
which does all the reading:
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
public class ZipFileSource implements Source {
private final String file;
private final String name;
private volatile ZipFile zip;
public ZipFileSource(final String file, final String name) {
this.file = file;
this.name = name;
}
@Override
public InputStream open() throws IOException {
close();
final ZipFile zipFile = new ZipFile(file);
final ZipEntry entry = zipFile.getEntry(name);
final InputStream stream = new ZipFileSourceZipInputStream(zipFile.getInputStream(entry));
this.zip = zipFile;
return stream;
}
@Override
public void close() throws IOException {
if (zip != null) {
zip.close();
zip = null;
}
}
private class ZipFileSourceZipInputStream extends InputStream {
private final InputStream stream;
ZipFileSourceZipInputStream(final InputStream stream) {
this.stream = stream;
}
@Override
public int read() throws IOException {
return stream.read();
}
@Override
public void close() throws IOException {
ZipFileSource.this.close();
stream.close();
}
}
}
I'm running a bit short on ideas. I've come down to either using a native zip extractor, locking every n
requests to do a System.gc()
call, or just giving up and letting it do its thing.