54

Assuming we have a max memory of 256M, why does this code work:

public static void main(String... args) {
  for (int i = 0; i < 2; i++)
  {
      byte[] a1 = new byte[150000000];
  }
  byte[] a2 = new byte[150000000];
}

but this one throw an OOME?

public static void main(String... args) {
  //for (int i = 0; i < 2; i++)
  {
      byte[] a1 = new byte[150000000];
  }
  byte[] a2 = new byte[150000000];
}
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • Or the compiler even does not compile the for loop, beacuse you are not using the a1 – AlexWien Nov 23 '12 at 14:31
  • You can freely change the code to eliminate that. For example `System.out.println(a1.length);`. – Marko Topolnik Nov 23 '12 at 14:32
  • Yes that should be done in that exmaple in both snippets – AlexWien Nov 23 '12 at 14:33
  • the following question is : why a1 is not garbage collected in the second example? – Jerome Nov 23 '12 at 14:40
  • @jerome Because garbage collection is delayed! It does not start immedeatly. If you want to force garabge collection: set a1 or a2 = null; and call System.gc() 5 times. That works, but it is advisable only in special high memory sitautions. E.g one time importing of hige data sets – AlexWien Nov 23 '12 at 14:41
  • @AlexWien But shouldn't the GC run once before OOME is raised? –  Nov 23 '12 at 14:45
  • @delnan No, it runs when it wants. (;-)) Thats the reason hard real time systems does not like garabage collectors, because for th eapllication it is not predictible when the GC runs. – AlexWien Nov 23 '12 at 14:45
  • I know it runs when it wants in general. But I vaguely recall a statement that OOME is only raised if allocation failed *and a GC run failed to free sufficient memory*. –  Nov 23 '12 at 14:47
  • @Marko But it is not garaunteed that it is successfull! I could improve my code and was successful to import some GB of data, by explictly calling the GC, when i know that i do not need the data anymore. In most sitauatipon this is not a good practise, but in some it is. – AlexWien Nov 23 '12 at 14:48
  • 1
    I don't think that setting a variable to null and calling gc 5 or 100 times guarantee that gc is actually executed... @Jerome, in the second example a1 is not garbage collected because it remains in memory as a static variable ans it's available to any instances of the class – Marsellus Wallace Nov 23 '12 at 14:48
  • 1
    @Gevorg There are no static variables involved here. – Marko Topolnik Nov 23 '12 at 14:49
  • @gevorg I believe it has been proven that `a1` is not static – Jerome Nov 23 '12 at 14:50
  • @MarkoTopolnik Well, Java semantics *pertaining the to GC* (if it's even Java semantics, and not a guarantee of a specific JVM -- I'm not sure on that). Your own phrasing refers to the GC too. But that's just nitpicking, what matters is that this answer doesn't provide an explanation. –  Nov 23 '12 at 14:50
  • @Jerome, It worked ever in my programms. You could search for MemoryTestBench, which used that techic tpo measure the amount of memory an java objects uses. But they call gc. 20 times. – AlexWien Nov 23 '12 at 14:51

2 Answers2

35

To keep things in perspective, consider running this code with -Xmx64m:

static long sum;
public static void main(String[] args) {
  System.out.println("Warming up...");
  for (int i = 0; i < 100_000; i++) test(1);
  System.out.println("Main call");
  test(5_500_000);
  System.out.println("Sum: " + sum);
}

static void test(int size) {
//  for (int i = 0; i < 1; i++)
  {
    long[] a2 = new long[size];
    sum += a2.length;
  }
  long[] a1 = new long[size];
  sum += a1.length;
}

Depending on whether you do the warmup or skip it, it will blow or not blow. This is because the JITted code properly nulls out the var, whereas the interpreted code doesn't. Both behaviors are acceptable under the Java Language Specification, which means that you are at the mercy of the JVM with this.

Tested with Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) on OS X.

Bytecode analysis

Look at the bytecode with the for loop (simple code, without the sum variable):

static void test(int);
  Code:
   0: iconst_0
   1: istore_1
   2: goto  12
   5: iload_0
   6: newarray long
   8: astore_2
   9: iinc  1, 1
   12:  iload_1
   13:  iconst_1
   14:  if_icmplt 5
   17:  iload_0
   18:  newarray long
   20:  astore_1
   21:  return

and without:

static void test(int);
  Code:
   0: iload_0
   1: newarray long
   3: astore_1
   4: iload_0
   5: newarray long
   7: astore_1
   8: return

No explicit nulling out in either case, but note that in the no-for example the same memory location is actually reused, in contrast with the for example. This would, if anything, lead to the expectation opposite to the observed behavior.

A twist...

Based on what we learned from the bytecode, try running this:

public static void main(String[] args) {
  {
    long[] a1 = new long[5_000_000];
  }
  long[] a2 = new long[0];
  long[] a3 = new long[5_000_000];
}

No OOME thrown. Comment out the declaration of a2, and it is back. We allocate more, but occupy less? Look at the bytecode:

public static void main(java.lang.String[]);
  Code:
     0: ldc           #16                 // int 5000000
     2: istore_1      
     3: ldc           #16                 // int 5000000
     5: newarray       long
     7: astore_2      
     8: iconst_0      
     9: newarray       long
    11: astore_2      
    12: ldc           #16                 // int 5000000
    14: newarray       long
    16: astore_3      
    17: return        

The location 2, used for a1, is reused for a2. The same is true for OP's code, but now we overwrite the location with a reference to an innocuous zero-length array, and use another location to store the reference to our huge array.

To sum it up...

Java Language Specification does not specify that any garbage object must be collected and the JVM spec only says that the "frame" with local variables is destroyed as a whole upon method completion. Therefore all behaviors that we have witnessed are by the book. The invisible state of an object (mentioned in the document linked to by keppil) is just a way to describe what happens to go on in some implementations and under some circumstances, but is in no way any kind of canonical behavior.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • ... but the Java Virtual Machine Specification describes the internal data structures of the VM. In particular, it writes that [local variables](http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.6.1) are stored in a frame, and the [frame](http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.6) is destroyed when the method invocation completes. Similarly, the JVM spec [mandates](http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.5.3) the use of an "automatic storage management system (known as a garbage collector)". – meriton Nov 23 '12 at 20:57
  • Admittedly, "the Java virtual machine assumes no particular type of automatic storage management system". – meriton Nov 23 '12 at 21:01
  • @meriton So all the spec has to say is that the objects become unreachable when the method as a whole completes. Nothing is said of variables in or out of scope; also nothing specific is required to happen to an unreachable object at any particular time. – Marko Topolnik Nov 23 '12 at 21:17
  • Since the scope of variables is lost in translation to bytecode it can not affect garbage collection (unless the compiler choose to resuse the local variable). But yes, the spec is quite vague on what *does* affect garbage collection. Practically speaking though, all major JVMs identify garbage by a reachability analysis, the root set of which includes the active frames. – meriton Nov 23 '12 at 22:32
  • @meriton Yes, from the practical standpoint it is quite clear what is happening in each case, except (for me, at least) how exactly interpreted code manages to release the object allocated in the loop, since the bytecode doesn't indicate it will happen. Apparently scope **does** somehow influence this, although clearly in no obvious way. – Marko Topolnik Nov 23 '12 at 22:44
26

This is because while a1 isn't in scope after the brackets, it is in a state called invisible until the method returns.

Most modern JVMs don't set the variable a1 to null as soon as it leaves the scope (actually, whether the inner brackets are there or not doesn't even change the generated byte code), because it is very ineffective, and usually doesn't matter. Therefore, a1 can't be garbage collected until the method returns.

You can check this by adding the line

a1 = null;

inside the brackets, which makes the program run fine.

The term invisible and the explanation is taken from this old paper: http://192.9.162.55/docs/books/performance/1st_edition/html/JPAppGC.fm.html.

Paul Bellora
  • 54,340
  • 18
  • 130
  • 181
Keppil
  • 45,603
  • 8
  • 97
  • 119
  • Most modern JVM's will set the var to `null` regardless of scope as soon as the last use is over, but this goes only for JITted code. – Marko Topolnik Nov 23 '12 at 14:52
  • 1
    Keppil, is this by the JLS? A reference would be great. – Marko Topolnik Nov 23 '12 at 14:53
  • Mind that bytecode is the same with the `for` variant, too. No explicit `null`ing in either case. – Marko Topolnik Nov 23 '12 at 14:55
  • That document does have an explanation, but, since it is not specified, it just restates what we already know: `a1` is retained outside the block. This just boils down to the fact that the JLS is really silent about reachability scope of objects. – Marko Topolnik Nov 23 '12 at 15:16