Basically, you named the most important reason, why retrofitting these classes as value based is not feasible: backward compatibility.
You might have noticed that the constructors of the primitive wrapper types have been deprecated in Java 9, which would be a step into that direction. Still, using identity sensitive operations is only discouraged, not forbidden, so a change that breaks compatibility can not be made on that basis. But potentially breaking identity sensitive operations, would be the only thing that may enable subsequent practical advantages from being a value based class.
For classes like String
, BigInteger
, and BigDecimal
, the JDK developers did not even dare to make the step of deprecating the constructors, most likely because that would be too disrupting. For some constructors, there’s not even an equivalent factory method.
But there’s more than just the public constructors.
See, the documentation of the valueOf
methods, the one of Integer
exemplary:
This method will always cache values in the range -128 to 127, inclusive, …
So when the factory method is used, you still get a specified identity behavior for some cases.
Which brings us to JLS §5.1.7:
If the value p being boxed is the result of evaluating a constant expression (§15.29) of type boolean
, byte
, char
, short
, int
, or long
, and the result is true
, false
, a character in the range '\u0000'
to '\u007f'
inclusive, or an integer in the range -128
to 127
inclusive, then let a
and b
be the results of any two boxing conversions of p
. It is always the case that a == b
.
So even the language specifies the behavior of certain identity sensitive operations.
Note that the specification tries not to name the valueOf
method that the compiled code will use in practice, but to make up their own rules (which formally only apply to compile-time constants), which did not really pay off. As this answer documents, that part of the specification underwent several rewrites, so when anyone took the wording literally, the guarantees changed over time…
Other guarantees have burned into the developers’ minds much deeper:
JLS §15.29, Constant Expressions:
Constant expressions of type String
are always "interned" so as to share unique instances, using the method String.intern
.
This is a guaranty about object identity that is impossible to turn down.
Interestingly, JLS §15.18.1 states:
The String object is newly created (§12.5) unless the expression is a constant expression (§15.29).
It’s not clear whether this strict wording is intentional, but as written, it states that for non-constant string concatenation, it must produce a new object with a distinct identity. Yet another specified behavior that developers should not rely on.
So, if someone was to design a new language without legacies, there is nothing wrong with designing these types as value types in the first place. The designer just has to avoid to put all those guarantees into specification that were thought to be a good idea in the past.