5

In Effective JAVA by Joshua Bloch, when I was reading about static factory methods , there was a statement as follows

The ability of static factory methods to return the same object from repeated invocations allows classes to maintain strict control over what instances exist at any time. Classes that do this are said to be instance-controlled. There are several reasons to write instance-controlled classes. Instance control allows a class to guarantee that it is a singleton (Item 3) or noninstantiable (Item 4). Also, it allows an immutable class (Item 15) to make the guarantee that no two equal instances exist: a.equals(b) if and only if a==b. If a class makes this guarantee, then its cli- ents can use the == operator instead of the equals(Object) method, which may result in improved performance. Enum types (Item 30) provide this guarantee.

To investigate how == operator brings in performance improvements , I got to look at String.java

I saw this snippet

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String) anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                            return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

By performance improvement what does he mean here ? how it brings performance improvement .

Does he mean to say the following

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . If they hold same data they are one and the same object .That is they point to same memory location.

Am I right in this inference ?

If I am wrong can you guide me in understanding this ?

Hanky Panky
  • 46,730
  • 8
  • 72
  • 95
Harish Kayarohanam
  • 3,886
  • 4
  • 31
  • 55

6 Answers6

3

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . If they hold same data they are one and the same object .That is they point to same memory location.

Yes, that is what the author is driving at.

If you can (for a given class, this won't be possible for all, in particular it cannot work for mutable classes) call == (which is single JVM opcode) instead of equals (which is a dynamically dispatched method call), it saves (some) overhead.

It works this way for enums for example.

And even if someone called the equals method (which would be good defensive programming practice, you don't want to get into the habit of using == for objects IMHO), that method could be implemented as a simple == (instead of having to look at potentially complex object state).

Incidentally, even for "normal" equals methods (such as String's), it is probably a good idea in their implementation to first check for object identity and then short-cut looking at object state (which is what String#equals does, as you have found out).

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • +1 for giving `enum` as an example of what this *does* work for. – Jason C Nov 17 '13 at 05:26
  • I strongly disagree with the part concerning `cannot work for mutable classes`. Actually, it's `equals` what stops working whenever you put mutable objects into collection. The reference equality works *always*, it just has a different semantics. – maaartinus Nov 17 '13 at 08:01
  • @maaartinus: I am having a hard time wrapping my head around how interning would work with mutable objects. And `equals` cannot have different semantics from reference equality in this scenario (that they become interchangeable is the whole point of this exercise). – Thilo Nov 17 '13 at 10:57
  • @Thilo: Imagine a class `Person` with an unique immutable `id`. Assume two persons are equal if they have the same `id` (such assumptions are common when working with databases). Such a `Person` could be used as key in maps and it could be interned. Changing other attributes would work. – maaartinus Nov 17 '13 at 11:49
  • 1
    @maaartinus Ew. I would never make an `equals` method that only compared the `id` if there are other mutable fields, because those are part of the object's observable state. – C. K. Young Nov 17 '13 at 13:57
  • @Chris Jester-Young: But then your maps and sets containing those objects break anytime you change anything. A `Person` class with an id-based `equals` represents the person's identity, while one testing all properties is a sort of name, age, and whatever container. Both have their uses (though I'd prefer the former). – maaartinus Nov 17 '13 at 14:07
  • 1
    @maaartinus It depends on whether you consider your type to be a "value type" or "reference type". I feel that data access objects are value types. In the way we architect our code, we use an `Integer` for the map keys, not the data access objects themselves. We have a separate cache (with soft-referenced values, usually) for looking up those objects, if they are expensive to refetch. – C. K. Young Nov 17 '13 at 14:11
  • Your concept of using the `id` only while keeping everything else mutable reminds me of the use of stateful enum values. Personally I prefer to keep enum values stateless, especially if calling code uses the enum values directly. In limited special cases, such as if I'm implementing an enum-based singleton, stateful is okay; the enum is just an implementation detail. It's all about reducing surprises for users of your code. – C. K. Young Nov 17 '13 at 14:49
  • @Thilo: I think in the scenario being hinted at, there could never exist two distinct `Person` instances with the same ID. If one piece of code calls `CreatePerson("123-45-6789)` and no other piece of code has ever done so, it will create a new object with default properties. If another piece of code calls `CreatePerson("123-45-6789");` it will get a reference to that same mutable object. Effectively, a person-ID and a reference to a person become equivalent. – supercat Nov 17 '13 at 21:04
  • @maaartinus: Am I understanding you correctly [see my above comment]? – supercat Nov 17 '13 at 21:04
  • @supercat: I guess so. When using Hibernate, you can run into quite some problems with equals and one solution is basing it on ID alone. This brings other problems and interning the objects as you wrote seems to be helpful. – maaartinus Nov 17 '13 at 21:55
  • @maaartinus: I'm not quite clear where an `equals` method based on ID alone would be helpful or appropriate. If objects are mutable, and an objects ID is supposed to be its identity, then there should never *exist* two distinct objects with the same ID, so an override of `equals` based on ID should never behave differently from the default reference-equality test. What am I missing? – supercat Nov 17 '13 at 23:45
  • @supercat: How can you achieve that the distinct objects never exist? Your `CreatePerson` factory can do it, but what about objects coming from a database, deserialization, or whatever? You may want to intern them immediately, but having defined `equals` you don't have to. – maaartinus Nov 18 '13 at 00:13
  • @maaartinus: There must be no means of creating the object except via factories which will keep track of all extant instances. This isn't just to ensure that `equals` works. If `mailroom.morningWorker` and `payroll.afternoonWorker` are distinct `Person` objects with id #12345, what will happen if that person puts in $10 worth of overtime in both places, so the code does both `mailroom.morningWorker.currentWeekPay += 1000;` and `payroll.afternoonWorker.currentWeekPay += 1000`? Having two distinct objects, each with its own `currentWeekPay`, is a recipe for disaster. – supercat Nov 18 '13 at 15:50
  • @maaartinus: One could have a `Person` object encapsulate nothing except an immutable combination of a person ID and a database ID, in which case having duplicate instances would be harmless (though `equals` should check *both* IDs), but if the `Person` object encapsulates a `Person`'s mutable state, fundamental correctness requires either that the object be unique, or that one employ rather complex mechanisms to maintain consistency [and in the latter case, `equals` had better be testing more than just ID!] – supercat Nov 18 '13 at 15:54
  • @supercat: "recipe for disaster" - maybe... I'm not sure if you can avoid the creation of another instance with serialization, but I agree that replacing it ASAP is important. "immutable combination of a person ID and a database ID" what database ID? Without the DB ID, it's just like Chris Jester-Young's solution, just using a more fancy `Integer`. And no, testing anything mutable in `equals` doesn't save from any problem here; it can only cause many others. – maaartinus Nov 18 '13 at 16:32
  • @maaartinus: The only scenario I can see in which it would sense for `Person` to be mutable would be it *is* an entity; with relatively few exceptions, *entities cannot be serialized*--only information about them. If someone were to transcribe your genetic pattern, it might in theory be possible to create a person from that transcribed genetic pattern, but *that person wouldn't be you*. As for "database ID", perhaps I should have said that a `Person` could encapsulate whatever information is necessary to access a single store of information somewhere which represents the person. – supercat Nov 18 '13 at 16:49
  • @maaartinus: An alternative approach would be for various departments to hold `PersonPartialPay` objects, and have for each employee a list of all `PersonPartialPay` objects which are attached to them. In that case, if someone is employed in multiple departments, each could have an object that it could modify at will, but the object associated with employee #12345's employment in the mailroom would be distinct from the one associated with his employment in the payroll office, and the two should not be considered "equal" even if both offices pay him the same amount. – supercat Nov 18 '13 at 17:09
  • @supercat: Shouldn't we keep our discussion short? It gets a bit philosophical now. Keeping aside `PersonPartialPay` which are separate objects, I used something like mutable serializable `Person` with an ID-based `equals` without interning. It's not perfect, but as good as it could get. – maaartinus Nov 18 '13 at 19:11
  • 1
    @maaartinus: If that design works for you, great, though it doesn't make sense to me. To my mind, Objects should generally either encapsulate data, in which case only objects which match totally should be equal, or entities, in which case only objects representing the same entity should be equal. Your object sounds like a weird hybrid of a sort that I would want to split into a separate entity part and data part, but perhaps you can get by without such subdivision. – supercat Nov 18 '13 at 19:44
3

What the quoted portion means is that an immutable class can choose to intern its instances. This is easy to implement via Guava's Interner, for example:

public class MyImmutableClass {
    private static final Interner<MyImmutableClass> INTERN_POOL = Interners.newWeakInterner();
    private final String foo;
    private final int bar;

    private MyImmutableClass(String foo, int bar) {
        this.foo = foo;
        this.bar = bar;
    }

    public static MyImmutableClass of(String foo, int bar) {
        return INTERN_POOL.intern(new MyImmutableClass(foo, bar));
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(foo, bar);
    }

    @Override
    public boolean equals(Object o) {
        if (o == this)
            return true;        // fast path for interned instances
        if (o instanceof MyImmutableClass) {
            MyImmutableClass rhs = (MyImmutableClass) o;
            return Objects.equal(foo, rhs.foo)
                    && bar == rhs.bar;
        }
        return false;
    }
}

Here, the constructor is made private: all instances have to be through the MyImmutableClass.of() factory method, which uses the Interner to ensure that if the new instance is equals() to an existing instance, the existing instance is returned instead.

Interning can only be used for immutable objects, by which I mean objects whose observable state (i.e., the behaviour of all its externally-accessible methods, in particular equals() and hashCode()) does not change for the objects' lifetimes. If you intern mutable objects, the behaviour will be wrong when an instance is modified.

As many other people have already stated, you should carefully choose which objects to intern, even if they're immutable. Only do it if the set of interned values is small relative to the number of duplicates you are likely to have. For example, it's not worth interning Integer generally, because there are over 4 billion possible values. But it is worth interning the most commonly-used Integer values, and in fact, Integer.valueOf() interns values between -128 and 127. On the other hand, enums are great to intern (and they are interned, by definition) because the set of possible values is small.

For most classes in general, you'd have to do heap analysis, such as by using jhat (or, to plug my own project, fasthat), to decide if there are enough duplicates to warrant interning. In other cases, just keep it simple and don't intern.

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • Yes .I want some clarification .After reading ur answer I saw https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/ and it says "Note that ALL Strings which are hardcoded (as constant or anywhere in code) are automatically interned by the compiler." . So my question is when does string interning happen ? always ?Is that note indirectly giving some hint when string interning will not happen ? – Harish Kayarohanam Nov 17 '13 at 06:44
  • @Harish Kayarohanam: There's no automatic *process* of string interning. It's just that all the strings appearing as constant in your source code start their lives in the interning pool already. Whenever you write `String a = "x", b = "x";` then you can be sure that `a == b ` holds. Similarly for things like `Something.class.getName()`, I guess. Whenever you call `s.substring()`, you can bet it doesn't get interned automatically. – maaartinus Nov 17 '13 at 08:07
  • @Chris Jester-Young: I'd slightly refine your statement about mutable classes. It can work as long as no field included in `equals` changes (no need to speak about `hashCode` here, as all field therein must be included in `equals` too). Interning mutable objects has exactly the same problem as using them as keys in collections. – maaartinus Nov 17 '13 at 08:12
  • @maaartinus can you direct me to some video tutorials or resources that can explain these interning and mutability concepts ? – Harish Kayarohanam Nov 17 '13 at 08:14
  • @Harish Kayarohanam: I doubt there's anything beyond the [JLS](http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.5). Note that interning is not as cool as it looks. It has it's cost as pointed in another answer and it's rarely worth it. Be warned about premature optimization. You might want to have a look at [hash code caching](https://groups.google.com/d/msg/project-lombok/ImSQ7X7PeBo/2QnBotCWdGMJ). – maaartinus Nov 17 '13 at 08:20
  • @maaartinus does this mean that interning really benefits when the number of instances is small ? As Stephen suggested if there is a large number of instances , then there hashtable itself will occupy so much memory and the searching will also take up memory . Am I right in my inference ? – Harish Kayarohanam Nov 17 '13 at 08:30
  • But from https://blog.codecentric.de/en/2012/03/save-memory-by-using-string-intern-in-java/ I am getting a sense that when there are more instances to be created interning saves space tremendously . his example "From this screenshot we can see that empty Strings take a lot of memory! 2 million empty Strings take a total of 130MB" . 130MB is huge right ? – Harish Kayarohanam Nov 17 '13 at 08:33
  • so finally the two inferences in two of the above comments are contradicting . so when is interning beneficial ? will amount of time and memory spent in searching far less than the 130 MB (for the same example ). Is this the tradeoff suggest that interning is beneficial under both conditions less objects and more objects ? – Harish Kayarohanam Nov 17 '13 at 08:35
  • @Harish Kayarohanam: Sure you can save a lot memory if there are many equal instances. And you waste some memory and time if there are only a few. It's rather obvious: The bigger the instance and the bigger the number of equal ones, the more you can save. Most of the time interning is not profitable, but in some cases it may help a lot. – maaartinus Nov 17 '13 at 09:06
  • @maaartinus . can you quotes instances where interning is not profitable . Can you direct me to some examples , so that it teaches me not to use intern badly .. pls .. – Harish Kayarohanam Nov 17 '13 at 09:09
  • I have raised the doubt here http://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern as a comment to his answer – Harish Kayarohanam Nov 17 '13 at 09:23
  • @Harish Interning can be useful if the universe of possible values is bounded and there are potentially many duplicates. In that case, interning is justifiable for the memory savings. It's usually not really about the runtime savings, and like all the other comments say, you have to be judicious about which classes to intern. For example, you shouldn't intern `Integer` generally (though for small values, like between -128 and 127, those _are_ interned if you use the `Integer.valueOf()` factory method; but that's another instance of "bounded universe"). – C. K. Young Nov 17 '13 at 14:04
  • @HarishKayarohanam Interning is a very simple concept. The best I can do to teach you how to not use it badly is to simply say: Dont use it right now. The nature of your questions suggests that you do not understand some fundamental concepts, and so you are not ready to be asking these questions. If interning an item makes sense, use it. If not, dont. If you are unsure, don't, and spend time learning fundamental concepts instead. If you can't make a logical conclusion about its usefulness, you will likely have bigger issues. – Jason C Nov 17 '13 at 18:16
  • I mean, this is like asking "When should I use the + operator?" Well, when you want to add numbers, duh. It's that obvious, and if you had to ask that question, then you're missing some really fundamental concepts of writing computer programs and asked the wrong question. – Jason C Nov 17 '13 at 18:18
1

If you can guarantee that no two instances of an object exist such that their semantic values are equivalent (i.e. if x and y refer to different instances [x != y] then x.equals(y) == false for all x and y), then this implies that you can compare two references' objects for equality simply by checking to see if they refer to the same instance, which is what == does.

The implementation of == essentially just compares two integers (memory addresses) and generally would be faster than virtually all nontrivial implementations of .equals().

It is worth noting that this is not a jump that can be made for Strings, as you cannot guarantee that any two instances of a String are not equivalent, e.g.:

String x = new String("hello");
String y = new String("hello");

Since x != y && x.equals(y), it is not sufficient to just do x == y to check for equality.

Jason C
  • 38,729
  • 14
  • 126
  • 182
  • 1
    If one wanted to have this system for Strings, the JVM would need to be re-spec'd to call `intern` on all new String objects (`==` does work for interned Strings). – Thilo Nov 17 '13 at 05:28
  • 1
    But your example is wrong, because `x` and `y` both point to the same string literal, and string literals are always interned in Java. If you changed your example to use `new String("hello")` for at least one of those variables, then your assertion of `x != y` would indeed be true. – C. K. Young Nov 17 '13 at 05:39
  • What am I to infer from the code public class HelloWorld{ public static void main(String []args){ String x = new String("hello"); String y = new String("hello"); System.out.println(x == y); String a = "hello"; String b = "hello"; System.out.println(a == b); } } o/p false true why interning does not happen in the first case new String(). Is it because we explictly ask for a new instance using new String() ; – Harish Kayarohanam Nov 17 '13 at 08:39
  • yes I got solution here . http://www.ntu.edu.sg/home/ehchua/programming/java/J3d_String.html – Harish Kayarohanam Nov 17 '13 at 08:52
  • Even minimal knowledge of Java objects, the language, and 'new' would have let you arrive at that conclusion on your own. You really need to get back to language basics, check out the official tutorials on the oracle site. – Jason C Nov 17 '13 at 18:20
1

To answer your questions ...

By performance improvement what does he mean here [String]? How it brings performance improvement.

This is NOT an example of what Bloch is talking about. Bloch is talking about instance-controlled classes, and String is not such a class!

Am I right in this inference?

Yes that is correct. An instance-controlled class for which the instances are immutable can ensure that objects that are "the same" will always be equal according to the == operator.

Some observations though:

  • This only applies to immutable objects. Or more precisely to objects where mutation does not affect the semantics of equality.

  • This only applies to fully instance-controlled classes.

  • Instance control can be expensive. Consider the form of (partial) instance control provided by the String class's intern method and the string pool.

    • The string pool is effectively a hash table of weak references to String objects. This occupies extra memory.

    • Each time you intern a String, it will calculate the string's hash code and probe the hash table to see if a similar string has already been intern'd

    • Each time a full GC is performed, the weak references in the string pool result in extra "tracing" work for the GC, and then potentially more work if the GC decides to break references.

    You typically get similar overheads when you implement your own instance-controlled classes. When you do cost-benefit analysis, these overheads count against the benefits of faster instance comparison.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

I think it means this:

If you need to test two complex structures for equality you generally need to do a lot of tests to make sure they are the same.

But if because of some trick of the language you knew that two complex but equal structures can't exist simultaneously then instead of verifying equality by comparing them bit by bit you can just verify that they are in the same location in memory and return false if they are not.

If anyone can create objects then you can't guarantee that two objects can't be created that are the same but are distinct instances.. but if you control the creation of objects and only create distinct objects then you don't need complex equality tests.

odedsh
  • 2,594
  • 17
  • 17
0

In cases where complicated values are encapsulated using references to immutable objects, there are generally three scenarios that can arise when comparing two references:

  • They are references to the same object (very fast)

  • They are references to different objects which encapsulate different values (often fast, but sometimes slow)

  • They are references to different objects which encapsulate the same value (generally always slow)

If objects will be found to be equal more often than not, there can be substantial value to minimizing the frequency of case 3. If objects will often be very nearly equal, there can also be substantial value to ensuring that the slow subcases of case 2 don't happen very often.

If one makes certain that for any given value there will never be more than one object which holds that value, code which observes that two references identify different objects may infer that they encapsulate different values, without having to actually examine the values in question. The value of doing this is often somewhat limited, however. If the objects in question are large, complicated, nested collections which will sometimes be very similar, one may have each collection compute and cache a 128-bit hash of its contents; two collections with different content are unlikely to have matching hash values, and collections with different hash values may quickly recognized as unequal. On the other hand, having references that encapsulate the same content generally identify to the same object, even if a few references to identical collections exist, can improve the performance of the otherwise-always-bad "equals" case.

An approach that one could use if one didn't want to use a separate interning collection would be to have each object keep a long sequence number such that one can always determine which of two otherwise-identical objects was created first, along with a reference to the oldest object which is known to hold the same content. To compare two references, start by identifying the oldest object known to be equivalent to each. If oldest object known to match the first isn't the same as the oldest object known to match the second, compare the objects' contents. If they match, one will be newer than the other, and that object can regard the other as the "oldest object known to match".

supercat
  • 77,689
  • 9
  • 166
  • 211