21

Using both Java 8 and Java 11, consider the following TreeSet with a String::compareToIgnoreCase comparator:

final Set<String> languages = new TreeSet<>(String::compareToIgnoreCase);
languages.add("java");
languages.add("c++");
languages.add("python");

System.out.println(languages);                 // [c++, java, python]

When I try to remove the exact elements present in the TreeSet, it works: all of those specified are removed:

languages.removeAll(Arrays.asList("PYTHON", "C++"));

System.out.println(languages);                 // [java]

However, if I try to remove instead more than is present in the TreeSet, the call doesn't remove anything at all (this is not a subsequent call but called instead of the snippet above):

languages.removeAll(Arrays.asList("PYTHON", "C++", "LISP"));

System.out.println(languages);                 // [c++, java, python]

What am I doing wrong? Why does it behave this way?

Edit: String::compareToIgnoreCase is a valid comparator:

(l, r) -> l.compareToIgnoreCase(r)
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 5
    Related bug entry: https://bugs.openjdk.java.net/browse/JDK-8180409 (TreeSet removeAll inconsistent behaviour with String.CASE_INSENSITIVE_ORDER) – Progman Jan 01 '20 at 14:04
  • A closely related [Q&A](https://stackoverflow.com/questions/28671903/the-hashsett-removeall-method-is-surprisingly-slow). – Naman Jan 01 '20 at 18:27
  • Related: https://stackoverflow.com/questions/18123178/unintuitive-behavior-of-removeall-method-in-sets/18124076 – Didier L Jan 06 '20 at 12:41

1 Answers1

22

Here's the javadoc of removeAll():

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

In your second experiment, you're in the first case of the javadoc. So it iterates over "java", "c++", etc. and checks if they're contained in the Set returned by Set.of("PYTHON", "C++"). They're not, so they're not removed. Use another TreeSet using the same comparator as argument, and it should work fine. Using two different Set implementations, one using equals(), and the other one using a comparator, is a dangerous thing to do indeed.

Note that there is a bug opened about this: [JDK-8180409] TreeSet removeAll inconsistent behaviour with String.CASE_INSENSITIVE_ORDER.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • Do you mean when both sets would have the same characteristics, it works? `final Set subLanguages = new TreeSet<>(String::compareToIgnoreCase);` `subLanguages.addAll(Arrays.asList("PYTHON", "C++", "LISP"));` `languages.removeAll(subLanguages);` – Nikolas Charalambidis Jan 01 '20 at 13:52
  • 1
    You're in the case "If this set has fewer elements", described by the javadoc. The other case being "If the specified collection has fewer elements". – JB Nizet Jan 01 '20 at 13:54
  • 8
    This answer is right, but it is very unintuitive behavior. It feels like a flaw in the design of `TreeSet`. – Boann Jan 01 '20 at 14:55
  • I agree, but I can't do anything about that. – JB Nizet Jan 01 '20 at 14:58
  • Unless someone can find a case use for this anomally I would expect it could be fixed with minimal repercussions to existing code. – WJS Jan 01 '20 at 16:02
  • Finally, is it a bug or just a very unintuitive feature with correctly documented behavior? – Nikolas Charalambidis Jan 02 '20 at 09:44
  • 5
    It's both: it's a very unintuitive behavior that is correctly documented, but, being unintuitive and deceiving, it's also a design bug that might, some day, be fixed. – JB Nizet Jan 02 '20 at 09:55