Java hashCode, artificial fields?

Question

Imagine the following problem:

    // Class PhoneNumber implements hashCode() and equals()
PhoneNumber obj = new PhoneNumber("mgm", "089/358680");
System.out.println("Hashcode: " +
    obj.hashCode());  //prints "1476725853"

// Add PhoneNumber object to HashSet
Set<PhoneNumber> set = new HashSet();
set.add(obj);

// Modify object after it has been inserted
obj.setNumber("089/358680-0");

// Modification causes a different hash value
System.out.println("New hashcode: " +
    obj.hashCode()); //prints "7130851"

// ... Later or in another class, code such as the following
// is operating on the Set:

// Unexpected Result!
// Output: obj is set member: FALSE
System.out.println("obj is set member: " +
    set.contains(obj));

If I've got a class and I want all my fields to be editable and still be able to use a set / hashCode. Would it be a good idea to create an artificial uneditable field in the class that is set at creation of the object? For example the current time in ms. When I've got that field, I can base the hashcode upon it and I would still be able to edit all the "real" fields. Would this be a good idea?

Only if that behaviour makes sense. How are you going to look something up in the set based on an "artificial" field? — Oliver Charlesworth, Feb 15 '15 at 11:09
of course, thats called "data encapsulation" : https://en.wikipedia.org/wiki/Data_encapsulation — specializt, Feb 15 '15 at 11:09
Answer to your last question: NO. That would defeat the entire purpose of hashing – quick search ability. You'll need to post the `hashCode()` of PhoneNumber. If the object is mutable (like in this case; setNumber(..)), then your hashCode(..) needs to calculate the hash everytime its invoked. — UltraInstinct, Feb 15 '15 at 11:09
about the hashCode : dont use hashCode _or_ create a sensible hashCode()-method which creates _NO_ _COLLISION_, meaning you will have an upper limit of 2^32 unique elements — specializt, Feb 15 '15 at 11:11
possible duplicate of [Mutable objects and hashCode](http://stackoverflow.com/questions/4718009/mutable-objects-and-hashcode) — Joe, Feb 15 '15 at 11:11
@specializt: Huh? You can't avoid hashCode if you're using a HashSet. — Oliver Charlesworth, Feb 15 '15 at 11:12
@specializt: what is a "normal" set? If you want O(1) lookup, then you have to use hashes. — Oliver Charlesworth, Feb 15 '15 at 11:14
...i think you should do some research on java data types, there is a lot for you to discover. The thing about O(1) is completely irrelevant here, there is no requirement in his question — specializt, Feb 15 '15 at 11:16
@specializt: this is all nonsense. HashSet is the *de facto* standard choice, because of its performance characteristics. Using a hashCode does not limit you to 2^32 elements. Switching to, say, a TreeSet would not fix the OP's problem. — Oliver Charlesworth, Feb 15 '15 at 11:18
yes, you really need to do a lot of research - please do so now, without further knowledge you will only create even more awkward situations like this one. hashCode returns `int` hence there are only 2^32 possible permutations hence there can be only 2^32 unique elements. — specializt, Feb 15 '15 at 11:20
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/70954/discussion-between-oliver-charlesworth-and-specializt). — Oliver Charlesworth, Feb 15 '15 at 11:20

ThanksForAllTheFish · Accepted Answer · 2015-02-15T11:52:45.123

I strongly believe you are presenting a bad use case: if you need to modify object in a Set, you should definitely remove the old one and re-add the new one (or use another java.util.Collection). Taking from your example:

Set<PhoneNumber> set = new HashSet();
set.add(obj);

// Modify object after it has been inserted
set.remove(obj);
obj.setNumber("089/358680-0");
set.add(obj);

The whole purpose of hashCode is to create a bucket of similar objects to reduce the search space, therefore it should be immutable but useful for you (if you use an artificial field, how do you find the object in your set later on? How do you retrieve this artificial field, given you are not with persistence storage of any type - the id in a database is an exception in the usage of artificial field IMHO).

To explain the meaning of

The whole purpose of hashCode is to create a bucket of similar objects to reduce the search space

have a look at this sample code: http://ideone.com/MJ2MQT. I (wrongly) created to objects with the same hash code, then added both to a set; as expected, the set contains both of them, because the hash code is used to retrieve the elements which collide and then the equals method is called to solve this collision. Collisions (read different objects which return same hash code) are unavoidable, and the goal of a proper designed hash code function is to reduce them as much as possible.

score 1 · Answer 2 · answered Feb 15 '15 at 11:14

Storing mutable objects in a hash set, or using them as keys in a hash map, is definitely not a good idea, precisely for the reason that you illustrate in your code.

On the other hand, defining an artificial number that serves as an ID of an object defeats the purpose of having a hash code in the first place, because it does not help you find an object that is equal to a given object by limiting the search to objects with identical hash codes.

In fact, your solution is not different from constructing a Map<Integer,PhoneNumber> from an "artificial hash code" to your mutable PhoneNumber object. If finding objects by association is what you need, HashMap from an artificial ID to the mutable object is the way to go.

score 0 · Answer 3 · answered Feb 15 '15 at 11:11

0

It usually makes sense to have a unique identifier for your data objects, especially if you are persisting them in some database. It will allow you to have an easy implementation of equals and hashCode, which will only depend on this single identifier.

I'm not sure the current time in ms. will be the best choice, but you should definitely generate some unique ID.

answered Feb 15 '15 at 11:11

Eran

387,369
54
702
768

2

While I agree on having unique identifier for databases, I tend to disagree if that's a good approach for what OP is doing. The fact that he is using `Set` makes me feel he is expecting duplicate objects, and wants to retain only unique ones. Basing hashCode (and hence equals) on unique identifiers will ensure no two objects will the same fields are ever equal even if they have identical real content. – UltraInstinct Feb 15 '15 at 11:16
@Thrustmaster That depends on when the unique identifier is generated. If it's only generated when a new object is created (as opposed to an existing object being loaded from the DB or from some other persistent store), it will make sense. If, on the other hand, it's generated for any new instance of the class, it will make no sense at all, since in that case you can simply use the default implementation of Object's equals (==) and hashCode. – Eran Feb 15 '15 at 11:22
Yes, that's exactly why I agreed with you on the database part. With regards to the question posted, there's no mention of database at all. As for the second part of your comment, the OP will still need to override equals() & hashCode() and can not use Object's default implementation. – UltraInstinct Feb 15 '15 at 11:26
@Thrustmaster Actually, if each object being instantiated gets a new generated identifier (in which case I think we both agree this identifier will be useless), a.equals(b) will return true if and only if a == b. That's why I said the default implementation of equals will behave the same as comparing the generated identifiers of two objects. – Eran Feb 15 '15 at 11:31

Java hashCode, artificial fields?

3 Answers3