5

I have a problem of collecting some list values to buckets. For example, let's assume I have a list of Strings:

List<String> strs = Arrays.asList("ABC", "abc", "bca", "BCa", "AbC");

And I want to put the strings into set (or list) of sets, that contain only case-different strings, i.e. for example above it would be collection of two sets: [["ABC", "abc", "AbC"], ["bca", "BCa"]]

So help me please to write collector for this problem.

List<Set<String>> result = strs.stream()
                .collect(/* some collectors magic here */)
Ilya
  • 71
  • 1
  • 4
  • I don't understand the result. `"ABC"` is all uppercase and it is in the same group as `"AbC"`? Could you explain further "that contain only case-different strings"? – Tunaki Jan 26 '16 at 20:15
  • All strings in the bucket should be equal ignoring case, but of course they can be different if we compare it as usual. (`"ABC".equalsIgnoreCase("AbC") == true` but `"ABC".equals("AbC") == false`) – Ilya Jan 26 '16 at 20:17
  • Not sure you want to jump straight to a `Collector` here. Perhaps going through a `Map` of `Set`s, then dump the values into a `List`? – Erick G. Hagstrom Jan 26 '16 at 20:21
  • Collect (in the normal way) into a Guava multimap built using a case-insensitive map (factory) as the collection supplier. Gives you a different datatype than you specify (a `Map` instead of a `List`) but if that's not important to you then ... – davidbak Jan 26 '16 at 20:32
  • 2
    If this question is really specifically about collections of strings and grouping them while ignoring case please change the title to say so. – glts Jan 26 '16 at 21:27

3 Answers3

11

The "some collectors magic" you are looking for can be done in two steps:

  • first, you need to group the elements by the property you are looking for. In this case since you want to ignore the casing, String#toLowerCase does the job (don't forget the overloaded method that takes a Locale as parameter). You also want the values grouped to be unique so you can use the overloaded version of groupingBy to put them into a Set (the default implementation uses a List)
  • since you're not interested in the keys, just grab the values from the resulting map and put them in a list (if you really need a list) using the collectingAndThen collector.

import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toSet;

...

List<Set<String>> result = 
    strs.stream()
        .collect(collectingAndThen(groupingBy(String::toLowerCase, toSet()), 
                                   m -> new ArrayList<>(m.values())));
Alexis C.
  • 91,686
  • 21
  • 171
  • 177
  • 2
    This is a lot cleaner in my opinion. – Tunaki Jan 26 '16 at 20:48
  • 1
    Probably worth noting that this supports only a naive form of case-insensitivity, not as defined in Unicode (as I learnt just the other day http://stackoverflow.com/a/34966206). – glts Jan 26 '16 at 21:25
  • @glts: if you need sophisticated case insensitivity, you may use a [`Collator`](https://docs.oracle.com/javase/8/docs/api/?java/text/Collator.html). So you don’t need a 3rd party library every time. However, even for the simple case insensitivity, just converting to lowercase is not sufficient. It’s better to group into a `TreeMap<>(String.CASE_INSENSITIVE_ORDER)`. And since the resulting order has no meaning here, a `Set` would be more appropriate for the result than a `List`. – Holger Jan 27 '16 at 10:13
  • @Holger Yes, but as far as I understand `Collator` does not do Unicode case folding out of the box. It also compares `"world"` and `"W-OR LD"` as equal which may not be what you want. – glts Jan 27 '16 at 11:03
  • @glts: since supporting Unicode case folding is one of the primary goals, I’d expect it to do that. Maybe, you’ve got confused by the statement that such things are implemented by (non-public) subclasses. That shouldn’t bother you, just use the [factory method](https://docs.oracle.com/javase/8/docs/api/java/text/Collator.html#getInstance--) to get an implementation. The treatment of spacing and punctuation characters might be in issue, indeed. – Holger Jan 27 '16 at 11:18
  • @Holger I believe `Collator` is for doing Unicode collation, ie locale-dependent string comparison and ordering as in library catalogues, indexes, and so on. Case folding is locale-independent and intended for caseless (case-insensitive) matching and searching. They cover different use cases. I think for case folding you really do have to use ICU. Anyway, this is off-topic so I'm stepping away now – thanks for the conversation. – glts Jan 27 '16 at 20:45
2

Try:

List<Set<String>> result = 
  strs.stream()
      .collect(groupingBy(String::toLowerCase, toSet())) // Map<String, Set<String>>
      .values()            // Collection<Set<String>>
      .stream()            // Stream<Set<String>> 
      .collect(toList());  // List<Set<String>>
Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
  • Is there no way to do it in one stream? – Ilya Jan 26 '16 at 20:28
  • @AlexisC. I don't know why *collectingAndThen* ... *groupingBy* was important enough to be put in the Streams API (over other things that were omitted) but it is sure useful on occasion. – davidbak Jan 26 '16 at 20:28
0

Here is code by abacus-common

List<String> strs = N.asList("ABC", "abc", "bca", "BCa", "AbC");
List<Set<String>> lists = Stream.of(strs).groupBy(N::toLowerCase, Collectors.toSet()).map(Entry::getValue).toList();

Declaration: I'm the developer of abacus-common.

user_3380739
  • 1
  • 14
  • 14