3

I am attempting to load 69,930 files into a basic text editor. This goes smoothly and after they are all loaded the memory sits at a very cool 130MB. However, during the peak loading time this can hit a maximum of 900MB - 1200MB.

The memory is all referencing the Inflater#buf field. This is used only to load the file into the object model, then it is never used again and the bytes can be cleared.

Obviously, the extra memory is all cleared by the garbage collector soon after loading - so no memory leaks. However, it just seems unnecessary to use so much extra memory.

What I have tried:

  1. The memory issue is 'resolved' by making a System.gc() call immediately after closing the ZipFile. This results in ~75% monitor time on the threads, high CPU usage and slow load times.
  2. Reducing thread-pool-count. This reduced the impact (to 300MB) yet resulted in significantly longer load times.
  3. WeakReference

What I have so far:

Heap size analysis

I call the load through a 4-thread-count thread pool, each one performing the relatively simple task:

// Source source = ...;
final InputStream input = source.open();

// read into object model

input.close();

The Source in this case is a ZipFileSource which does all the reading:

import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;

public class ZipFileSource implements Source {

    private final String file;
    private final String name;

    private volatile ZipFile zip;

    public ZipFileSource(final String file, final String name) {
        this.file = file;
        this.name = name;
    }

    @Override
    public InputStream open() throws IOException {
        close();

        final ZipFile zipFile = new ZipFile(file);
        final ZipEntry entry = zipFile.getEntry(name);

        final InputStream stream = new ZipFileSourceZipInputStream(zipFile.getInputStream(entry));

        this.zip = zipFile;

        return stream;
    }

    @Override    
    public void close() throws IOException {
        if (zip != null) {
            zip.close();
            zip = null;
        }
    }

    private class ZipFileSourceZipInputStream extends InputStream {

        private final InputStream stream;

        ZipFileSourceZipInputStream(final InputStream stream) {
            this.stream = stream;
        }

        @Override
        public int read() throws IOException {
            return stream.read();
        }

        @Override
        public void close() throws IOException {
            ZipFileSource.this.close();
            stream.close();
        }
    }
}

I'm running a bit short on ideas. I've come down to either using a native zip extractor, locking every n requests to do a System.gc() call, or just giving up and letting it do its thing.

Is there a way I can more effectively manage the memory before it builds up (requiring a garbage collection call)?

Obicere
  • 2,999
  • 3
  • 20
  • 31
  • 1
    Create smaller zip file, in the first place :) – Eric Mar 29 '16 at 18:16
  • 1
    @EricWang I'm actually loading the entire Eclipse library I have. Because... why not? – Obicere Mar 29 '16 at 18:17
  • Did you just trying to make the jvm use less memory while running the program? Or ? – Eric Mar 29 '16 at 18:24
  • @EricWang I am trying to 'level-out' that memory spike that happens during the loading of the file. Afterwards there are no leaks or memory issues. – Obicere Mar 29 '16 at 18:25
  • From my opinion, it's not necessary. JVM won't clear heap immediately after some memory is not used, for better performance. You can see from your graph that, `minor gc` or `full gc` is executed by jvm 4 times automatically, to clean up the young generation or old generation during file loading. Then you called a `full gc` explicitly. It's not a memory leak. GC will effect the normal service of Java program, so it's delayed until necessary (before out of memory). You can add more memory, and allocate it to the Java process if it really need that. – Eric Mar 29 '16 at 18:32
  • @EricWang I just fiddled around with some stuff and [got this](http://i.imgur.com/uKiHmLX.png). Barely increased load time, yet you can see the vast difference in the spikes. However, I asked this question to see if there is a more appropriate solution than a `static int`, a remainder operator and a `System.gc()` call. Its honestly just to make the process smoother - not to resolve any issues (like memory leaks). So in my opinion, it is a necessary change. – Obicere Mar 29 '16 at 18:45
  • If your zip files are separated into many small zip files, then you can: 1) If you really need control the memory used by jvm, then set the `-Xmx` to a small value, the trade off is gc will be called more frequently, that's not good for your service I guess. 2) If you want the program to run really fast, then allocate more memory also via `-Xmx`, so that gc will be called less frequently, and start n threads in parallel where n = your cpu count. – Eric Mar 29 '16 at 18:48
  • From the surface of your new fiddled result the overall time has not much differ. But in some real-time mission curcial system (e.g stock trading system), it could be deadly. When you do GC, especially minor GC, the delay of service (during that short period) could cause a great financial lose, just due to the several hundred of milliseconds. – Eric Mar 29 '16 at 18:55
  • Perhaps try to some kind of buffer to load smaller pieces of the ZIP to work with. If you want to load the entire ZIP in memory then the behavior you've shown is to be expected. – Jire Mar 29 '16 at 20:06
  • @Jire there are 576 individual jars. The largest I saw from a quick scan was about 4MB. I'd say that's plenty of small enough pieces. – Obicere Mar 29 '16 at 20:20
  • Possible duplicate of [Java VM - does the freed memory return to the OS?](http://stackoverflow.com/questions/2419247/java-vm-does-the-freed-memory-return-to-the-os) – the8472 Mar 30 '16 at 09:26
  • @the8472 I changed my question to make it more clear how it is not a duplicate, as that attempts to solve the problem after it has been created. This question is more about not letting the memory build up at all. – Obicere Mar 30 '16 at 18:14
  • @Obicere, that's the same thing. When a major GC happens it will remove all collectible objects and then resize the heap accordingly, allowing as much "wastage" (which is there for a reason) as configured by those parameters. What more than that do you expect? You can't release memory that's still referenced. – the8472 Mar 31 '16 at 15:46
  • See also http://stackoverflow.com/questions/1481178/how-to-force-garbage-collection-in-java – Raedwald Mar 31 '16 at 16:00
  • Possible duplicate of [Force full garbage collection when memory occupation goes beyond a certain threshold](http://stackoverflow.com/questions/2448941/force-full-garbage-collection-when-memory-occupation-goes-beyond-a-certain-thres) – Raedwald Mar 31 '16 at 16:07
  • Once again, I'm going to point out that this does not need to be solved through garbage collection and probably shouldn't. – Obicere Mar 31 '16 at 16:08

1 Answers1

0

A) if your application keeps running it will GC eventually and collect those objects when it needs the memory.

B) if your application is done at that point... well... just let the VM die and it'll release the memory back to the OS.

Either way, there is no real memory "waste".

The point of garbage collectors is to amortize the cost of collection over time. It can only only do that by deferring it to some point in the future instead of trying to free() everything immediately like manually managed languages would.

Also note that your chart only shows the used heap (blue) going down. From the OS perspective the allocated heap (orange) stays the same anyway, so that downwards slope on the blue chart doesn't gain you anything.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • I rephrased the title to better reflect what I am trying to achieve. A) its unnecessary to have so much memory allocated in the first place, so just leaving it as is is'not acceptable. B) this is just at the loading phase. I never specifically said this has to be solved with using `System.gc()`... I'd also prefer to solve it some other way (maybe I am missing a library that does this). The orange line doesn't reflect the actual memory allocated by the machine. I am just going to delete this question and make my own shared-buffer zip system, because nobody here is even close to being on topic. – Obicere Mar 31 '16 at 16:02