3

I know similar question has been asked many times previously but I am still not convinced about when objects become eligible for GC and which approach is more efficient.

Approach one:

for (Item item : items) {
    MyObject myObject = new MyObject();
    //use myObject.
}

Approach Two:

MyObject myObject = null;
for (Item item : items) {
    myObject = new MyObject();
    //use myObject.
}

I understand: "By minimizing the scope of local variables, you increase the readability and maintainability of your code and reduce the likelihood of error". (Joshua Bloch).

But How about performance/memory consumption? In Java Objects are Garbage collected when there is no reference left to the object. If there are e.g. 100000 items then 100000 objects will be created. In Approach One each object will have a reference (myObject) to it so they are not eligible for GC?

Where as in Approach Two with every loop iteration you are removing reference from the object created in previous iteration. so surely objects start becoming eligible after the first loop iteration.

Or is it a trade off between performance and code readability & maintainability?

What have I misunderstood?

Note: Assuming I care about performance and myObject is not needed after the loop.

Thanks In Advance

webDeveloper
  • 1,284
  • 3
  • 16
  • 31
  • Performance might be a concern, but it's far better to write correct code and put the variable declaration inside the loop. *Then **measure** performance.* Never "optimize" anything until you're sure it needs it. Anything else is madness. – markspace Feb 11 '15 at 16:14
  • I think GC is smart enough to clear those objects in Approach Two after each loop if needed. I always prefer approach one, declaring the variable closer to where it is being used. – Bohn Feb 11 '15 at 16:16
  • There are 100000 objects created in both versions. The only difference between them is that the last-created object isn't eligible for GC as soon in one of them. – user207421 Feb 11 '15 at 16:22
  • I would expect that it is much more expensive to create them compared to GC cleaning up (depends of course on your object), then in total GC will have only little impact on performance. However, I would just try it out... would not be surprised if there is no measurable difference in performance at all – 463035818_is_not_an_ai Feb 11 '15 at 16:27
  • ^ no, not possible duplicate, definitely duplicate. – Mike Nakis Feb 11 '15 at 16:29
  • @MikeNakis no, not definite duplicate, not a even possible duplicate. Those questions/answers do not mention or at least clearly mention when exactly objects become eligible for GC. – webDeveloper Feb 11 '15 at 17:08

4 Answers4

5

If there are e.g. 100000 items then 100000 objects will be created in Approach One and each object will have a reference (myObject) to it so they are not eligible for GC?

No, from Garbage Collector's point of view both the approaches work the same i.e. no memory is leaked. With approach two, as soon as the following statement runs

myObject = new MyObject();

the previous MyObject that was being referenced becomes an orphan (unless while using that Object you passed it around, say, to another method where that reference was saved) and is eligible for garbage collection.

The difference is that once the loop runs out you would have the last instance of MyObject still reachable through the myObject reference originally created outside the loop.


Does GC know when references go out of scope during the loop execution or it can only know at the end of method?

First of all there's only one reference, not references. It's the objects that are getting unreferenced in the loop. Secondly, the garbage collection doesn't kick in spontaneously. So forget the loop, it may not even happen when the method exits.

Notice that I said, orphan objects become eligible for gc, not that they get collected immediately. Garbage collection never happens in real time, it happens in phases. In the mark phase, all the objects that are not reachable through a live thread anymore are marked for deletion. Then in the sweep phase, memory is reclaimed and additionally compacted much like defragmenting a hard drive. So, it works more like a batch rather than piecemeal operations.

GC isn't bothered about scopes or methods as such. It only looks for unreferenced objects and it does so when it feels like doing it. You can't force it. The only thing that you can be sure of is that GC would run if the JVM is running out of memory but you can't pin exactly when it would do so.

But, all this does not mean that GC can't kick in while the method executes or even while the loop is running. If you had, say, a Message Processor that processed 10,000 messages every 10 mins or so and then slept in between i.e. the bean waits within the loop, does 10,000 iterations and then waits again; GC would definitely kick into action to reclaim memory even though the method hasn't run to completion yet.

Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
  • Good points and agree but when does the first object created in approach one become eligible? after the loop or after the first iteration? – webDeveloper Feb 11 '15 at 16:58
  • In the second iteration as soon as `MyObject myObject = new MyObject();` gets executed, the `MyObject` created in the **first** iteration become eligible for GC. On this front, there's no difference between the two approaches. – Ravi K Thapliyal Feb 11 '15 at 17:07
  • Does GC know when references go out of scope during the loop execution or it can only know at the end of method? EJP below has commented that it doesn't know during the loop? I didn't find anything useful on google. Thanks – webDeveloper Feb 16 '15 at 16:43
  • @webDeveloper Added an update. – Ravi K Thapliyal Feb 16 '15 at 18:09
4

You have misunderstood when objects become eligible for GC - they do this when they are no longer reachable from an active thread. In this context that means:

  • When the only reference to them goes out of scope (approach 1).
  • When the only reference to them is assigned another value (approach 2).

So, the instance of MyObject would be eligible for GC at the end of each loop iteration whichever approach was used. The difference (theoretically) between the two approaches is that the JVM would have to allocate memory for a new object reference each iteration in approach 1 but not in approach 2. However, this assumes the Java compiler and/or Just-In-Time compiler is not smart to optimise approach 1 to actually act like approach 2.

In any case, I would go for the more readable and less error prone approach 1 on the grounds that:

  • The performance overhead for a single object reference allocation is tiny.
  • It will probably get optimised away anyway.
BarrySW19
  • 3,759
  • 12
  • 26
0

I would not expect declaring the variable inside a block to have a detrimental impact on performance.

At least notionally the JVM allocates the stack frame at the start of the method and destroys it at the end. By implication will have the cumulative size to accommodate all the local variables.

See section 2.6 in here: http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html

That is consistent with other languages such as C where resizing the stack frame as the function/method executes is an overhead with no apparent return.

So wherever you declare it shouldn't make a difference.

Indeed declaring variables in blocks may help the compiler realize that the effective size of the stack frame can be smaller:

void foo() {
   int x=6;
   int y=7;
   int z=8;

  //.....
}

Versus

void bar() {
   { 
     int x=6;
     //....
   }
   {
     int y=7;
     //....
   }
   {
     int z=8;
     //....
   }
 }

Notice that bar() clearly only needs one local variable not 3.

Though making the stack frame smaller is unlikely to have any real influence on performance!

However when a reference goes out of the scope may make the object it references available for garbage collection. You would otherwise need to set references to null which is an untidy and unnecessary bother (and tinsy weenie overhead).

Without question you should declare variables inside a loop if (and only if) you don't need to access them outside the loop.

IMHO blocked statements (like bar has above) are under used.

If a method proceeds in stages you can protect the later stages from variable pollution using blocks.

With suitable (short) comments it can often be more readable (and efficient) way of structuring code than breaking it down it lost of private methods.

I have a chunky algorithm (Hashlife) where making earlier artifacts available for garbage collection during the method can make the difference between getting to the end and getting OutOfMemoryError.

Persixty
  • 8,165
  • 2
  • 13
  • 35
  • I'd disagree - I think if you have sections in a method which can be naturally broken apart into blocks with minimal shared data then that's a pretty strong hint that they should be refactored as separate methods. – BarrySW19 Feb 11 '15 at 16:41
  • @BarrySW19 I agree. But sometimes it isn't minimal shared data. My `bar()` is a toy example. Assume each section builds a number of contributions to the final block. It's a fag (and overhead) in Java to return any more than one value from a method. If you comment the sections and have a 'folding editor' you'll start to realize that decomposition into many methods is far from the only and always best way to structure code. Methods are for re-use. Blocks are for structure! I know this is heresy. – Persixty Feb 11 '15 at 16:45
0

In both approaches objects will get Garbage collected.

In Approach 1: As and when for loop exits , all the local variable inside for loop get Garbage collected , as the loop ends.

In Approach 2 : As when new new reference is assigned to myObject variable the earlier has no proper reference .So that earlier get garbage collected and so on until loop runs.

So in both approaches there is no performance bottle neck.

hasha
  • 304
  • 1
  • 3
  • 11
  • Not really, I would think GC is smart enough to collect objects that are out of scope. See Mike, Ravi and Barry's answers. – webDeveloper Feb 11 '15 at 17:17
  • @webDeveloper GC doesn't *know* when *references* go out of scope, except at the end of a method. There is no bytecode instruction corresponding to an interior `}`, *ergo* there cannot possibly be any associated JVM or HotSpot or GC action. – user207421 Feb 11 '15 at 23:34
  • @EJP am I wrong in saying that in approach one the first object created in iteration one will be eligible for GC in the second iteration and will be picked up by GC if it was to run during second iteration. I believe this is what Ravi and BarrySW19 are saying? – webDeveloper Feb 12 '15 at 12:34