45

While working with a tree set, I found very peculiar behavior.

As per my understanding following program should print two identical lines:

public class TestSet {
    static void test(String... args) {
        Set<String> s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        s.addAll(Arrays.asList("a", "b"));
        s.removeAll(Arrays.asList(args));
        System.out.println(s);
    }

    public static void main(String[] args) {
        test("A");
        test("A", "C");
    }
}

but strangely it prints:

[b]
[a, b] 

I am unable to understand - Why is tree set behaving like this?

T-Bag
  • 10,916
  • 3
  • 54
  • 118
  • 4
    looks like a bug... – Maurice Perry May 11 '17 at 13:31
  • 2
    The crazy thing is it shouldn't even remove the `"a"` String at all, even in the first invocation. The ordering determines the sorting within the `TreeSet`, but equality should be used to determine what to remove, and `"a"` is not equal to `"A"`. Running this actually made me go "what the f-". – G_H May 11 '17 at 13:32
  • 4
    @G_H in a treeset the equality is fully defined by the comparator, if they compare equal then they are considered equal – ratchet freak May 11 '17 at 15:49
  • @ratchetfreak--you are right. – T-Bag May 11 '17 at 16:38
  • @ratchetfreak Yup, after reviewing the docs it's clear. I had expected it to conform to the contract for a Set, given that it implements that interface, but SortedSet overrules it with different rules. – G_H May 11 '17 at 21:07

4 Answers4

42

This happens because a SortedSet’s Comparator is used for sorting, but removeAll relies on the equals method of each element. From the SortedSet documentation:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

The explanation of “consistent with equals” is defined in the Comparable documentation:

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

In summary, your Set’s Comparator behaves differently than the elements’ equals method, causing unusual (though predictable) behavior.

VGR
  • 40,506
  • 4
  • 48
  • 63
  • 2
    Small remark, `removeAll` relies on `contains` under normal circumstances as you can see in Shadov's answer, which relies on `equals`, but depending on relative collection size it may just call `remove` which in a `TreeSet` depends only on the comparator, not equality. So actually your first sentence needs to be adjusted, because it only relies on `equals` in SOME circumstances. – G_H May 11 '17 at 13:44
  • 1
    @G_H That is an implementation detail. Collection implementations have been known to change significantly between releases (and in fact that happened for Java 8); it’s not advisable to rely on implementation details. Rather, rely on the contract—that is, the javadoc—which is stable across releases. – VGR May 11 '17 at 13:48
  • 5
    The issue here is exactly that relying on the contract produces an unexpected result. The sizes of the collection on which the method is invoked and the argument determine which method is used (iterate over `this` or the argument). Depending on that, removal is either determined via `this.remove(argumentElement)`or via `argument.contains(elementFromThis)`. So it **can** but doesn't always depend on the `contains()` semantics of the given argument, while `remove()` is also expected to operate on `contains()`, but for a `TreeSet` it doesn't, it operates on the `compareTo()` values. ... – G_H May 11 '17 at 13:56
  • 5
    Basically `AbstractSet` works on the assumption that `contains()` and `remove()` have the same semantics for every collection (equality checks) but then `TreeSet` which inherits from it goes and violates that contract. Just because it mentions this doesn't make it okay, and certainly not when the results are inconsistent (even though predictable). The outcome here depends on implementation details rather than the semantics. – G_H May 11 '17 at 13:56
  • 1
    @G_H it's not implementation details but documented: "a sorted set performs *all element comparisons* using its compareTo (or compare) method, *so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal*." This means that every method in AbstractSet that uses `equals` has been overridden to use `compare==0` instead. – ratchet freak May 11 '17 at 15:56
  • @G_H An API can certainly demand certain properties to hold for it to work correctly. "Passed in Comparator has to agree with equals() for Set semantics to hold" is a perfectly valid limitation. I agree though that it's incredibly bad API design. – Voo May 11 '17 at 19:51
  • 2
    To highlight a point in G_H's comment, if you pass another TreeSet(String.CASE_INSENSITIVE_ORDER) as the parameter to removeAll (instead of the list), it works as expected, because both collections now adhere to the same definition of contains(). – Scott A Miller May 11 '17 at 19:51
  • 2
    I'm just surprised that `TreeSet` doesn't override `removeAll()` to use comparator equality. – shmosel May 11 '17 at 20:32
  • @ratchetfreak You are correct. However, the behavior in the question has an outcome depending on an implementation detail, namely the optimization for which collection to iterate over when calling removeAll, which comes from AbstractSet. If you search for TreeSet behavior you'll find lots of people (including many questions on SO) being surprised by it. So I agree it's bad API design: if enough people run into trouble with the API's contract, the API is at fault, not the devs. – G_H May 11 '17 at 21:12
15

Well, this surprised me, I don't know if I'm correct, but look at this implementation in AbstractSet:

public boolean removeAll(Collection<?> c) {
  Objects.requireNonNull(c);
  boolean modified = false;

  if (size() > c.size()) {
    for (Iterator<?> i = c.iterator(); i.hasNext(); )
      modified |= remove(i.next());
  } else {
    for (Iterator<?> i = iterator(); i.hasNext(); ) {
      if (c.contains(i.next())) {
        i.remove();
        modified = true;
      }
    }
  }
  return modified;
}

Basically in your example, the size of set is equal to the size of arguments you want to remove, so the else condition is invoked. In that condition there is a check if your collection of arguments to remove contains the current element of iterator, and that check is case sensitive, so it checks if c.contains("a") and it returns false, because c contains "A", not "a", so the element is not removed. Notice that when you add an element to your set s.addAll(Arrays.asList("a", "b", "d")); it works correctly, because size() > c.size() is now true, thus there is no contains check anymore.

Shadov
  • 5,421
  • 2
  • 19
  • 38
  • Shouldn't the `remove(Object o)` method actually check for equality? Or does TreeSet assume that if compareTo returns 0 it implies equality? That seems a bit weird, just because one thing is considered to be equivalent to another in ordering doesn't mean it's the same. – G_H May 11 '17 at 13:37
  • Honestly, I don't know, maybe it's intentional, but since it behaves differently based on length of the original set like I said before, then it may in fact be a bug. – Shadov May 11 '17 at 13:37
  • 2
    From the Javadoc for SortedSet: "Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) ... – G_H May 11 '17 at 13:39
  • 2
    This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface." – G_H May 11 '17 at 13:39
  • I can accept the lack of unsigned data types, but using two different methods of removal depending on the size of the `Collection` is outright ridiculous. – Jacob G. May 11 '17 at 13:40
  • 1
    And from Comparator: "The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S." Looks like this is indeed by design. Pretty tricky, this could come straight out of Java Puzzlers. – G_H May 11 '17 at 13:40
  • Interesting: If you use instead `s.addAll(Arrays.asList("a", "b", "x", "y"));` it all works as expected. Supporting your suggestion. – OldCurmudgeon May 11 '17 at 13:42
  • 2
    @JacobG. Checking which collection is smallest and iterating over that one (either `this` or the argument) is an optimization to avoid iterating over a 10000-element argument when `this` contains one element. But the unexpected side-effect here is pretty dramatic. – G_H May 11 '17 at 13:46
3

To add some information about why the remove of TreeSet actually removes case-insensively in your example (and provided that you follow the if (size() > c.size()) path as explained in the answer by @Shadov) :

This is the removemethod in TreeSet :

public boolean remove(Object o) {
        return m.remove(o)==PRESENT;
    }

it calls remove from its internal TreeMap :

public V remove(Object key) {
    Entry<K,V> p = getEntry(key);
    if (p == null)
        return null;

    V oldValue = p.value;
    deleteEntry(p);
    return oldValue;
}

which calls getEntry

 final Entry<K,V> getEntry(Object key) {
        // Offload comparator-based version for sake of performance
        if (comparator != null)
            return getEntryUsingComparator(key);
        if (key == null)
            throw new NullPointerException();
        @SuppressWarnings("unchecked")
            Comparable<? super K> k = (Comparable<? super K>) key;
        Entry<K,V> p = root;
        while (p != null) {
            int cmp = k.compareTo(p.key);
            if (cmp < 0)
                p = p.left;
            else if (cmp > 0)
                p = p.right;
            else
                return p;
        }
        return null;
    }

If there is a Comparator (as in your example), the entry is searched based on this Comparator (this is done by getEntryUsingComparator), that's why it is actually found (then removed) , despite the case difference.

Arnaud
  • 17,229
  • 3
  • 31
  • 44
0

This is interesting, so here are some tests with output:

static void test(String... args) {
    Set<String> s =new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    s.addAll(Arrays.asList( "a","b","c"));
    s.removeAll(Arrays.asList(args));
    System.out.println(s);
}

public static void main(String[] args) {
    test("C");          output: [a, b]
    test("C", "A");     output: [b]
    test("C", "A","B"); output: [a, b, c]
    test("B","C","A");  output: [a, b, c]
    test("K","C");      output: [a, b]
    test("C","K","M");  output: [a, b, c] !!
    test("C","K","A");  output: [a, b, c] !!
}

Now without the comparator it works just like a sorted HashSet<String>():

    static void test(String... args) {
    Set<String> s = new TreeSet<String>();//
    s.addAll(Arrays.asList( "a","b","c"));
    s.removeAll(Arrays.asList(args));
    System.out.println(s);
}

public static void main(String[] args) {
    test("c");          output: [a, b]
    test("c", "a");     output: [b]
    test("c", "a","b"); output: []
    test("b","c","a");  output: []
    test("k","c");      output: [a, b]
    test("c","k","m");  output: [a, b]
    test("c","k","m");  output: [a, b]
}

Now from the documentation:

public boolean removeAll(Collection c)

Removes from this set all of its elements that are contained in the specified collection (optional operation). If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

Source

TiyebM
  • 2,684
  • 3
  • 40
  • 66