3

I'm trying to remove an element from HashSet, but it won't.

When calling .contains(obj), it returns true, so .contains knows the object is in HashSet. But when I call .remove(obj), the object isn't removed from the HashSet and it returns false.

hashset object remove problem

Code of object - https://gist.github.com/rincew1nd/f97f11d21ba41d2f4197f9a9da02fea1

Sergei Bubenshchikov
  • 5,275
  • 3
  • 33
  • 60
Rincew1nd
  • 196
  • 5
  • 20

1 Answers1

8

This is because you haven't overriden Object#hashCode() in your class.

The implementation of hashCode() in Object returns a value that is computed from the calling objects memory address . So unless you pass a reference to the object contained in the map that you'd like to remove using remove(Object o) then the HashSet won't be able to locate the Object.
NOTE: Reference is not the same as a equal object defined by o1.equals(o2)

See bottom of post for explanation as to why 'HashSet#contains(Object o)' still works even when Objects being stored in the map haven't overriden Object#hashCode()


Say I have a class MyClass that doesn't override hashCode()
MyClass instance1 = new MyClass(); // Memory Address = 0x01
MyClass instance2 = new MyClass(); // Memory Address = 0x02
instance1.equals(instance2) == false

I can add these unique objects to a HashSet no problems, as adding to a HashSet only relies on equals(Object o), not hashCode().

Now let's say I want to remove them from the Set but I lost my original references to them, I do have the following references though:
instance3 // Memory Address = 0x03
instance4 // Memory Address = 0x04
instance1.equals(instance3) == true
instance2.equals(instance4) == true

A call to nonHashSet.remove(instance3) will remove instance1 from the Set. But Hash Based Sets work completely different: the hashCode (value computed from the memory address since we used Object#hashCode()) is used to locate the item. So when I ask the HashSet to remove instance3 it tells me that there is no element at that hash index. This is because the memory address of instance1 and instance3 are different, thus the hash for each is different meaning.

This is really bad as now the only reliable ways to remove instance1 or instance2 from the set is to clear the whole set or of course reinitialise it.


What happens under the hood in a HashSet:

When you attempt to add an element first the HashSet checks if another element considered equal to the element being added is already contained. If the element being added is unique to all other elements, it is added to the set.

How a HashSet differs from say a NonHashSet is how the elements are stored and retrieved.

In a HashSet, after an element is added to the set (as in when add(E element) returns true) a hash of that element is produced by calling element.hashCode(). In very loose terms you can think of the hash as the index that the element will be stored at.

Visual representation of storing elements in a hash based collection

Retrieved from https://en.wikipedia.org/wiki/Hash_table

Where key == element, hash function == element.hashCode(), buckets == indexes in the way I described it above

A note regarding Buckets: unlike traditional storage where each element gets its own index, it's not uncommon for two different elements (not considered equal to each other) to produce the same hash. When this happens a collision handling mechanism is used to store elements that have the same hash. This is beyond the scope of this question, but a common and simple way to handle this is to make a List at the collision hash index that stores all elements that contain the same hash.

When removing an element from a hash based set, the element that we are passing as the argument to be removed has it's hash calculated, then that hash is used to locate the element. If there is more than one element at the hash index (occurs when there are elements that have the same hash) then the collision handling mechanism the HashSet uses is used to find the exact item we are looking for. In the above specified mechanism (store a List at the collision index to store each collided element) elements.equals(collidedElement)

The general contract a hashCode() implementation should conform to: Hash Code Contract Java SE 8

Further information: Why do I need to override the equals and hashCode methods in Java?

Simple way to implement a EFFECTIVE hash function: Best implementation for hashCode method

How contains(Object o) Still Works:

Let's take a look at the code (Java Version 7u40-b43) that determines the return value ofcontains(Object o)in HashSet class. I've added comments to describe what is happening:

// contains(Object o) will return true if this method returns non-null
// NOTE: Entry<K,V> seems out of place as we are dealing with a Set not a 
//       Map. But in fact the backing structure for HashSet is a type of Map
final Entry<K,V> getEntry(Object key) {
        if (size == 0) { // Set is empty, search element can't be contained
            return null;
        }

        // Calculate the hash of the search key:
        // IF the search key == null --> hash = 0
        // ELSE calculate and assign the hashCode for key inside method 
        //      hash()
        int hash = (key == null) ? 0 : hash(key);

        // Start at hash calculated, after each iteration set e to be 
        // the element that comes after e  (defined by e.next field). Stop 
        // iterating when e is null
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            // IF (the hash calculated from the search key and the hash 
            //     calculated from this Entries key are equal AND
            //     this Entries key equals the search key) OR
            //    (search key != null AND the search key equals this Entries 
            //     key)
            // then return entry --> contains returns true
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }
Community
  • 1
  • 1
FireLight
  • 325
  • 3
  • 10
  • `hashCode` does *not* return the memory address of the object. It's a digested value (=hash) of the data stored in the object. – msp May 19 '17 at 06:34
  • I'm speaking of the `Object.hashCode()` implementation, not an overriden `hashCode()` implementation. In any case, if you believe I'm incorrect please provide a reference to back this up. – FireLight May 19 '17 at 07:20
  • 1
    `Object#hashCode` is nowhere documented to return the memory address of an object. Instead of asking someone to disprove your claim, @Wrap2Win, as the person making the claim you have the burden. The only possible support is a Javadoc entry that "This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language." The exact conversion is not specified, and it isn't required, and the result is not a memory address but an integer. So, no, `hashCode` does not return an address. – Lew Bloch May 19 '17 at 07:42
  • You haven't explained the discrepancy between `contains()` and `remove()`. – shmosel May 19 '17 at 07:43
  • 1
    @LewBloch I've been awake for 24 hours so it didn't really click that "`Object.hashCode()` returns the memory address` is obviously not the correct way to state it. In fact it is gross misrepresentation of the return value especially if you are comfortable with a language where you have direct access to pointers - I can see how to people with that understanding of pointers what I've said is a massive oversight! Thank you for clearing that up for me. I apologise @msparer if I came across rude, no sleep can do that to you! I'll make the relevant edits. – FireLight May 19 '17 at 08:14
  • @LewBloch *"This is typically implemented by converting the internal address of the object into an integer"* This might not be an actual RAM address, but it is still an address used for the hashcode. – Tom May 19 '17 at 09:07
  • @shmosel updated the answer to include at the bottom a code analysis at to why contains() will still work even if Object#hashCode() is being used on elements – FireLight May 19 '17 at 09:24
  • @LewBloch when should I use `Object#method()` instead of `Object.method()` to reference a method – FireLight May 19 '17 at 09:26
  • @Tom, I cited that exact quote, as I am sure you noticed, and explained why it does not mean that what `hashCode` returns is an address. All that line says is that the implementation _might_ start from a representation of the address. You're aware that the address of an object changes with a GC cycle, but the hashcode doesn't? And even starting with an address is only a suggestion; it isn't guaranteed, as the part of the quote you omitted points out. Bottom line: `hashCode` does not return an address, it returns an `int`. – Lew Bloch May 19 '17 at 09:29
  • @LewBloch You should stop putting words in someone elses mouth. My quote says what it says and nothing else. Neither have I said that it _is_ what `hashCode` returns (it just can use the internal address for building it) nor that it _must_ use the address. – Tom May 19 '17 at 09:32
  • Based on conventions in the industry press and Oracle documentation, I use `Type#method` to refer to an instance method, and `Type.method` to refer to a static method, but that's just me. I don't claim that it's standard. – Lew Bloch May 19 '17 at 09:32