2

Is it possible for another process (java oder not) running on the same operating system and hardware to trigger a

java.lang.OutOfMemoryError: GC overhead limit exceeded

by either consuming RAM and/or an extensive CPU load - or by some other means?


From the Java 8 documentation

The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap...

and this somewhat older thread I understand that this is time-sensitive. However, it seems to lack a proper specifications of what those 98% refer to.

Edit 20201008: Added Link to the Garbage Collector Ergonomics

Andreas
  • 127
  • 1
  • 1
  • 8
  • Sure you can, check this https://stackoverflow.com/questions/17112827/how-to-reproduce-java-outofmemoryerror-gc-overhead-limit-exceeded how to simply reproduce this error – Michal Drozd Oct 01 '20 at 11:36
  • 3
    @MichalDrozd this question is about *another process* triggering the error – Holger Oct 01 '20 at 12:35

1 Answers1

3

Yes, but this is very unlikely in a real life scenario.

For the JVM to throw java.lang.OutOfMemoryError: GC overhead limit exceeded, two conditions must be met:

  1. A GC cycle reclaims less than GCHeapFreeLimit (2%) heap space;
  2. JVM spends more than GCTimeLimit (98%) time doing GC.

An external process can hardly affect the first condition, unless it directly interacts with the target application. This means, the JVM should already be in "almost out of memory" state for the error to happen.

What another process can probably affect is the timing. If this process heavily utilizes shared CPU resources, it can make GC run slower by competing with the JVM for the CPU time. Slower GC means longer GC cycles and thus more percentage of time spent in GC.

I was able to create an artificial example when another process makes JVM throw GC overhead limit exceeded, but this was really tricky.

Consider the following Java program.

import java.util.ArrayList;

public class GCOverheadLimit {
    static ArrayList<Object> garbage = new ArrayList<>();
    static byte[] reserve = new byte[100_000];

    static void fillHeap() {
        try {
            while (true) {
                garbage.add(new byte[10_000]);
            }
        } catch (OutOfMemoryError e) {
            reserve = null;
        }
    }

    public static void main(String[] args) throws Exception {
        System.out.println("Filling heap");
        fillHeap();

        System.out.println("Starting GC loop");
        while (true) {
            garbage.add(new byte[10_000]);
            garbage.remove(garbage.size() - 1);
            Thread.sleep(20);
        }
    }
}

First, it fills the entire heap with non-reclaimable objects, leaving a small reserve of free memory. Then in repeatedly allocates reclaimable garbage to make GC happen again and again. There is a small delay between iterations to keep total GC overhead less than 98%.

The experiment uses 1GB heap and the Parallel GC:

java -Xmx1g -Xms1g -XX:+UseParallelGC GCOverheadLimit

I run this program in a cgroup with CPU quota. My machine has 4 cores, but I let the JVM use only 200 ms CPU time each 100 ms period.

mkdir /sys/fs/cgroup/cpu/test
echo 200000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
echo $JAVA_PID > /sys/fs/cgroup/cpu/test/cgroup.procs

So far the program works fine. Now I run one or two CPU burning processes in the same cgroup:

sha1sum /dev/zero &
echo $! > /sys/fs/cgroup/cpu/test/cgroup.procs

Due to the exceeded quota, the OS starts to throttle processes. GC times increase, and the JVM finally throws java.lang.OutOfMemoryError: GC overhead limit exceeded.

Note: reproducing the problem required careful selection of parameters (heap size, delays, quota). The parameters will be different for other machines and other environments. My point is - the problem is theoretically possible, but will probably never happen in practice, since there are too many factors that need to match together.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • oh man! I was trying the same thing (very close to this) without much success, this is very, _very_ nice. But does this mean _another_ process triggers the error? this surely looks like another process _influences_ the JVM. I am not sure if it it's just me or this are inter-changeable here. – Eugene Oct 08 '20 at 03:45
  • @Eugene For me, "forcing the JVM to throw the error" and "triggering the error" are interchangeable. – Andreas Oct 08 '20 at 07:25
  • Indeed, a nice example. So basically this boils down to some heavy CPU (over)load on the machine and, at the same time, having the GC under pressure. On a larger machine, i.e. 30+ HT cores w 300+ GB of RAM, the Parallel GC itself may impose quite a lot of CPU load. How much of additional CPU load by competing processes are needed to "tip the scales"? And is this only because the Parallel GC (presumably) bases its time calculation on elapsed real time/wall time instead of CPU time? – Andreas Oct 08 '20 at 07:51
  • @Andreas Right, GC overhead policy counts pause times, not CPU time. I can't say about 30+ cores, but my point was - even though it's theoretically possible that another process causes JVM to throw "GC overhead limit" error, it doesn't really matter in practice: when free memory is so low, there is a problem anyway. – apangin Oct 09 '20 at 01:22