Why in java the compiler doesn't optimize new when the end result is an immutable copy of interned object?

Question

WE have the behaviour that Java compiler will use the same instance if use a string constant

  String a = "abc";
  String b = "abc";
  //a == b
  String c = new String("abc");
  // c is a brand new object on the heap;

Why doesn't java compiler optimize out the new String and substitute it with the equivalent assignment ? Were there some deep design decisions or it is just a coincidence? Can we expect a different JVM or compiler to be more aggressive and actually replace the heap instances of immutable objects with a well-known static ones ? While the String is the most notorious example, we could have the same behaviour for Integer , for example.

Explicit expressions can be optimized by JITter. Why not this specific case ? — Dr Phil, May 13 '19 at 12:52
You might be interested in JEP-192 (String deduplication) https://openjdk.java.net/jeps/192 — René Link, May 13 '19 at 12:55
https://stackoverflow.com/questions/54918661/double-literally-result-false — Ng Sharma, May 13 '19 at 13:04
My question is more about the of why Java/JIT decided to do so. For example, I explicitly tell in the code that a and b are two different variables with two constants, that happen to have the same value. Why optimize one of them and not the new? — Dr Phil, May 13 '19 at 13:13
Java is not C++. If you write `new`, you *always* get a new object. There are no exceptions to this rule. — VGR, May 13 '19 at 13:16
What about the performance impact of interning? Also, what if two threads create the same new string at the same instant? Would interning involve locking then. That can't be good for performance if every string creation obtains a global lock on a segment of the string pool. — Teddy, May 13 '19 at 13:35
If you are just having fun, do not miss this one either: https://ideone.com/M5kgRA - `Integer` has some kind of ["interning"](https://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#valueOf(int)) too :-) — tevemadar, May 13 '19 at 13:37
For new and clone, interning can be considered as blocked by the contract of the method. But, what about string concatenation, and string builder cases? The contract is flexible in that case. So, the ultimate reason should be performance! — Teddy, May 13 '19 at 13:38
By definition the `new` operator is used to create a new instance of a class, mutable or not. — Maurice Perry, May 13 '19 at 13:46
That's an interesting article on how String.intern() works and how that evolved over time: http://java-performance.info/string-intern-in-java-6-7-8/ — Nestor Milyaev, May 13 '19 at 13:48
@DrPhil to answer your comment on *why*: `String` is the only class using auto-interning at compile type. Its content can use up a lot of memory and auto-interning was implemented to ease things up. — Matthieu, May 13 '19 at 13:51

score 1 · Accepted Answer · answered May 13 '19 at 14:25

First of all, the String(String) "copy" constructor stems from the initial days and is an anomaly. Maybe because of String.intern() which does a bit of copy prevention, as are the constants "...". It is never needed, as String is an immutable final class.

For Integer there is Integer.valueOf(int) that uses a cache of instants which per default holds -128 upto 127.

Despite the very competent compiler development team involved, the java byte code compiler compiles very naive. But then, on byte code to machine code, some nice things may happen. For instance object not created as such on the heap, but on the stack.

Simplistic compilation at least is less likely to contain errors in the dataflow analysis of a smart trick. (It also provides a good reason for good code style.)

An example:

List<String> list = ...
String[] array1 = list.toArray(new String[0]);
String[] array2 = list.toArray(new String[list.size()]);

toArray needs an actual array instance, as because of type erasure the List list no longer knows it contains Strings.

Historically as optimization one could pass an array of fitting size (here the version with list.size()) which would then be returned. More optimal and faster, and still some style checker mark the first version. However actually the first version is faster as an other array byte cdoe instantiation is used, and array1 will be fractionally faster generated.

The same story on division by some numbers. In C there are many compiler optimisations involving faster shifts. This is (partly) done in Java in the byte code to machine code compilation, a more logical place for these optimisations.

I personally think an optimizing byte code compiler would be nice, maybe something for university projects. However it might not be justifiable just for code improvements, like not using .equals for enum values.

Matthieu · Answer 2 · 2019-05-13T13:49:20.670

Strings are a bit different from other objects as they are widely used and often act as "native types" (liek int, float, ...) but are in fact arrays (i.e. not a fixed memory size). Using memory to store the same content over and over could cause the process to waste memory on the same content (and that has happened to me before). String interning was introduced to save developers the hassle of writing their own String pool.

The compiler interns String constants automatically. Integers can have the same mechanism but you need to explicitely call it through Integer.valueOf(int).

In your case, paraphrasing @BenjaminUrquhart, you explicitely told it to create a new instance by calling new, and new is required to create a new instance. There are situations where that is required, e.g. when you call obj.clone(), you expect a new object, not a new reference to obj.

Note that, in the case of clone(), returning a new instance does not sound mandatory, but rather a "general intent" (quoting Javadoc):

[clone()] Creates and returns a copy of this object. The precise meaning of "copy" may depend on the class of the object. The general intent is that, for any object x, the expression:
 x.clone() != x
will be true, (...)

So stricty speaking, it appears that you could return the same instance in that case, but it is not considered "good practice" (or at least, not something expected).

I guess it could have to do with arrays shallow copies, where the array itself is a different instance, but each object is a reference to objects in the original array instance (see JLS §10.7) so the return copy is not a totally independent copy of the original object.

if the object is immutable without any side effects why wouldn't returning the same instance in clone work ? — Dr Phil, May 13 '19 at 13:14
@DrPhil because the [*general intent* of `clone()`](https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#clone()) is that it should return a different instance. I will update my answer with the quote from the Javadoc. — Matthieu, May 13 '19 at 13:27
But then the intent of cloning object is to get a copy, which I can "mutate" or modify without affecting the original. For immutable objects cloning his redundant. Like it is for an enum — Dr Phil, May 13 '19 at 13:48
@DrPhil, it all depends on the sense you put in the term "redundant". It sounds like it is, for your use case; but Java decided that it should not be default behaviour to return the same instance for immutables, rather it did not forbit it. If you want to return the same instance, go ahead, but it should be *your* choice. I don't think the compiler has any knowledge of an object's "immutability", which is a concept more than an attribute... — Matthieu, May 13 '19 at 13:55

Why in java the compiler doesn't optimize new when the end result is an immutable copy of interned object?

2 Answers2