2

Using a 1 GB Java heap (-Xmx1g), I store data in many large byte arrays. I get an OutOfMemoryError quite some time before 1 GB of data is stored. At that time, there is still quite a lot of free heap as calculated by Runtime rt.maxMemory() - rt.totalMemory() + rt.freeMemory():

byte array size approx. data that can be stored approx. free heap shown
2^18 (262144) 800 MB 270 MB
2^17 (131072) 930 MB 140 MB
2^16 (65536) 997 MB 72 MB
2^15 (32768) 1032 MB 36 MB

Why is heap size calculation off for large byte arrays and can I do something to mend it?


Note: When using 2^19 (or larger) sized byte arrays, a different thing occurs: Java byte array of 1 MB or more takes up twice the RAM - let's focus this question on 2^18 sized byte arrays.

Run with 64-Bit Server VM AdoptOpenJDK 11.0.11, both on Windows java -cp .\lib\* -Xmx1g tryit.Main and Debian java -cp .:./lib/* -Xmx1g tryit.Main:

package tryit;

public class Main {
    public static void main(String[] args) throws Exception {
        byte[][] array = new byte[1000000][];
        long freeAtStart = free();
        System.out.println("Free at start: " + freeAtStart);
        int chunkSize = 2<<17; // This is 2^18.
        System.out.println("Chunk size   : " + chunkSize);
        for (int n = 0; n < 1000000; n++) {
            if (n % 50 == 0) {
                long currentFree = free();
                System.out.printf("%d: stored %d / allocated %d / free %d\n", n, n * chunkSize, freeAtStart - currentFree, currentFree);
            }
            array[n] = new byte[chunkSize];
        }
    }
    static long free() throws Exception {
        System.gc(); // Called just in case - there should not be anything to garbage collect.
        Thread.sleep(100); // Give GC some time to work
        return Runtime.getRuntime().maxMemory() - Runtime.getRuntime().totalMemory() + Runtime.getRuntime().freeMemory();
    }
}

And finally the (shortened) output of four runs:

2^15:
Free at start: 1068751960 / Chunk size: 32768
31500: stored 1032192000 / allocated 1032933912 / free 35818048

2^16:
Free at start: 1068751960 / Chunk size: 65536
15200: stored 996147200 / allocated 996627400 / free 72124560

2^17:
Free at start: 1068751960 / Chunk size: 131072
7100: stored 930611200 / allocated 930960032 / free 137791928

2^18:
Free at start: 1068751960 / Chunk size: 262144
3050: stored 799539200 / allocated 799823160 / free 268928800

2^19 (humongous objects - allocation size is two times stored size):
Free at start: 1068751960 / Chunk size: 524288
1000: stored 524288000 / allocated 1048811120 / free 19940840
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Georg
  • 966
  • 8
  • 25
  • Your test is not correct. `System.gc();` will **not** start GC with 100%. This is **only recommendation** to JVM to do so. JVM can **not do GC** instead. – Oleg Cherednik Jul 10 '21 at 21:37
  • I call `System.gc()` "just in case". There is nothing to garbage collect (everything is strongly referenced), so the example does not need to try trigger gc. – Georg Jul 10 '21 at 21:43
  • This is my output for 2^18 on Java 17: `3050: stored 799539200 / allocated 1064828992 / free 1617856` *edit* that was with the `gc()` call removed, with it I get `267966664`, i.e. very similar to you. So the gc call does make a difference? – tgdavies Jul 10 '21 at 23:26
  • that `1GB` is not used only by you - VM has it's data too, and it's not even entirely given to you, garbage collectors need they own structures to work, like SATB queues, or even [things like this](https://stackoverflow.com/a/64236979/1059372). what you see is pretty normal, to me. – Eugene Jul 11 '21 at 00:35
  • @oleg.cherednik under the current implementations, I checked a few, it will (assuming default flags). The thing is that `System.gc` starts a concurrent mark phase - what that triggers (or not) in turn, depends on the current heap. – Eugene Jul 11 '21 at 00:38

1 Answers1

5

As described in the linked answer (Java byte array of 1 MB or more takes up twice the RAM) and the G1 garbage collector documentation the G1 garbage collector divides the heap into regions of 1 MByte (2^20 bytes) each. For a 1GB heap that gives 1024 regions (probably a bit less due to administrative overhead).

Naively you would expect the a region of 2^20 bytes can hold 4 byte arrays of 2^18 bytes each - but unfortunately this is not the case. Byte arrays are objects, and objects have a hidden object header (see https://stackoverflow.com/a/50509263 for an explanation).

So the effective size of a byte[262144] is not 262144 bytes, it is 262160 bytes (depending on the JVM and the maximum heap size it could be even more), which means that each region can only hold 3 byte arrays of length 262144.

Combining 3 byte arrays per region with 1024 regions would give you a maximum of 3072 byte arrays of 262144 bytes for a 1 GB heap, which matches nicely with your numbers.


What you can do about it:

  • use larger regions (by supplying -XX:G1HeapRegionSize=4M) - a 4MB region can hold 15 byte arrays of length 262144, whereas 4 regions of 1MB can hold only 12 byte arrays of length 262144
  • use slightly smaller byte arrays - a 1MB region can hold only 3 byte arrays of length 262144, but 4 byte arrays of length 262128

Note: this post uses 2^20 to mean two to the power of twenty, which is not the same as the java expression 2^20, but rather 1<<20

Thomas Kläger
  • 17,754
  • 3
  • 23
  • 34