0

I am scratching my head trying to understand the point of the following code

    Map<String Set<MyOtherObj>> myMap = myapi.getMyMap(); 

    final MyObj[] myObjList;
    {
        final List<MyObj> list = new ArrayList<>(myMap.size());
        for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
            final int myCount = MyUtility.getCount(entry.getValue());
            if (myCount <= 0)
                continue;
            list.add(new MyObj(entry.getKey(), myCount));
        }
        if (list.isEmpty())
            return;
        myObjList = list.toArray(new MyObj[list.size()]);
    }

Which can be rewrite into the following

    Map<String Set<MyOtherObj>> myMap = myapi.getMyMap(); 

    final List<MyObj> list = new ArrayList<>(myMap.size());
    for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
        final int myCount = MyUtility.getCount(entry.getValue());
        if (myCount <= 0)
            continue;
        list.add(new MyObj(entry.getKey(), myCount));
    }
    if (list.isEmpty())
        return;

The only reason I can think of why we put the ArrayList in a block and then reassign the content to an array is

  1. The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
  2. There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.

So my question is, does the first code sample make sense, is it more efficient?

This code currently executes 20k message per second.

user207421
  • 305,947
  • 44
  • 307
  • 483
user10714010
  • 865
  • 2
  • 13
  • 20
  • The mechanism of garbage collection is _specifically_ left undefined by the Java spec; technically, it isn't required at all. Modern JVMs do in fact trade memory consumption for efficiency (specifically, less competition for locks around structures tracking memory allocation). – chrylis -cautiouslyoptimistic- Nov 21 '19 at 01:26
  • 1
    No it doesn't. There is no bytecode instruction corresponding to exiting a local block. – user207421 Nov 21 '19 at 01:31
  • @AlbertoSinigaglia That is completely incorrect, for the reason I just stated. – user207421 Nov 21 '19 at 01:32
  • 1
    @user207421 you are right, just compiled the two versions and the .class is the same – Alberto Sinigaglia Nov 21 '19 at 01:37
  • @AlbertoSinigaglia You didn't get the same byte code from both versions of the OP's code. You must have got it from something else in versions with and without an inner block. – user207421 Nov 21 '19 at 02:28
  • Meanwhile I am scratching my head as to how the first version of the OP's code even compiles, unless `myObjList` is unused afterwards outside the inner block, which makes it all pretty pointless. – user207421 Nov 21 '19 at 02:30
  • myObjList is defined outside of the block @user207421 – Lee Nov 21 '19 at 11:02
  • @Lee We can see that, but it is only conditionally initialized inside the block, so any subsequent use after the block should draw an 'initialized variable' compiler error. – user207421 Nov 23 '19 at 03:42
  • @user207421 There is no way to reach end of the block with uninitialized myObjList. So there will be no compile error because of that – Lee Nov 26 '19 at 07:52

2 Answers2

3

As stated in this answer:

Scope is a language concept that determines the validity of names. Whether an object can be garbage collected (and therefore finalized) depends on whether it is reachable.

So, no, the scope is not relevant to garbage collection, but for maintainable code, it’s recommended to limit the names to the smallest scope needed for their purpose. This, however, does not apply to your scenario, where a new name is introduced to represent the same thing that apparently still is needed.

You suggested the possible motivation

  1. The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space

but you can achieve the same when declaring the variable list as ArrayList<MyObj> rather than List<MyObj> and call trimToSize() on it after populating it.

There’s another possible reason, the idea that subsequently using a plain array was more efficient than using the array encapsulated in an ArrayList. But, of course, the differences between these constructs, if any, rarely matter.

Speaking of esoteric optimizations, specifying an initial array size when calling toArray was believed to be an advantage, until someone measured and analyzed, to find that, i.e. myObjList = list.toArray(new MyObj[0]); would be actually more efficient in real life.

Anyway, we can’t look into the author’s mind, which is the reason why any deviation from straight-forward code should be documented.

Your alternative suggestion:

  1. There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.

is missing the point. Any space optimization in Java is about minimizing the amount of memory occupied by objects still alive. It doesn’t matter whether unreachable objects have been identified as such, it’s already sufficient that they are unreachable, hence, potentially reclaimable. The garbage collector will run when there is an actual need for memory, i.e. to serve a new allocation request. Until then, it doesn’t matter whether the unused memory contains old objects or not.

So the code may be motivated by a space saving attempt and in that regard, it’s valid, even without an immediate freeing. As said, you could achieve the same in a simpler fashion by just calling trimToSize() on the ArrayList. But note that if the capacity does not happen to match the size, trimToSize()’s shrinking of the array doesn’t work differently behind the scenes, it implies creating a new array and letting the old one become subject to garbage collection.

But the fact that there’s no immediate freeing and there’s rarely a need for immediate freeing should allow the conclusion that space saving attempts like this would only matter in practice, when the resulting object is supposed to persist a very long time. When the lifetime of the copy is shorter than the time to the next garbage collection, it didn’t save anything and all that remains, is the unnecessary creation of a copy. Since we can’t predict the time to the next garbage collection, we can only make a rough categorization of the object’s expected lifetime (long or not so long)…

The general approach is to assume that in most cases, the higher capacity of an ArrayList is not a problem and the performance gain matters more. That’s why this class maintains a higher capacity in the first place.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    beautiful answer! this is going to be tattooed to forehead of the fellas I don't like at work: _but for maintainable code, it’s recommended to limit the names to the smallest scope needed for their purpose_. – Eugene Nov 21 '19 at 13:52
0

No, it is done for the same reason as empty lines are added to the code.

The variables in the block are scoped to that block, and can no longer be used after the block. So one does not need to pay attention to those block variables.

So this is more readable:

A a;
{ B b; C c; ... }
...

Than:

A a;
B b;
C c;
...
...

It is an attempt to structure the code more readable. For instance above one can read "a declaration of A a; and then a block probably filling a.

Life time analysis in the JVM is fine. Just as there is absolutely no need to set variables to null at the end of their usage.

Sometimes blocks are also abused to repeat blocks with same local variables:

A a1;
{ B b; C c; ... a1 ... }
A a2;
{ B b; C c; ... a2 ... }
A a3;
{ B b; C c; ... a3 ... }

Needless to say that this is the opposite of making code better style.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138